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1 .  INTRODUCTION 

As  modem  high-performance  aircraft  improve  in  performance  and  maneuverability,  their 
design  becomes  more  and  more  statically  unstable.  These  aircraft  depend  on  inner-loop 
stability/control  system  augmentation  to  increase  bare  airframe  stability  while  providing  the  pilot 
with  high  performance  and  superior  maneuverability  under  a  wide  range  of  flight  conditions.  This 
poses  a  challenge  to  the  flight  control  system  (FCS)  designer:  the  extreme  range  of  flight 
conditions  introduces  significant  uncertainties  and  nonlinearities  that  the  FCS  design  must  allow 
for.  In  addition,  the  FCS  must  be  ads^tive  and  reconfigurable,  in  order  to  maintain  stability  if  one 
of  the  control  effectors  fads. 

The  current  FCS  design  approach  is  1)  to  ^nerate  linearized  models  of  the  flight  dynamics 
for  a  large  set  of  trim  conditions,  2)  next  to  use  linear  control-system  theory  to  design  controllers 
and  controller  gains  that  are  valid  only  for  a  limited  region  of  the  state/control  space  around  the  trim 
point,  and  3)  to  design  a  gain  schedule  by  interpolating  the  gains  between  the  different  trim 
conditions.  While  such  a  procedure  has  resulted  in  satisfactory  FCS  performance  in  the  past,  it  has 
many  disadvantages:  a  time-consuming  and  expensive  tiial-and-error  process;  a  lack  of  adaptability 
of  the  design  to  changes  in  the  dynamics;  a  difficulty  in  handling  extremely  nonlinear  flight 
conditions,  such  as  what  occurs  at  high  angles  of  attack;  and  a  tendency  toward  a  conservative 
design  that  improves  robustness  to  uncertainties  at  the  expense  of  reduced  maneuverability. 

One  alternative  to  linear  control  methods  is  to  use  neural  networks  to  control  nonlinear 
systems.  An  artificial  neural  network  (ANN)  approach  to  FCS  design  might  provide  a  means  of 
eliminating  or  reducing  many  of  the  disadvantages  of  control  methods  that  rely  on  linearized 
models.  Recoidy  proposed  types  of  neural  networks  can  accomplish  complex  tasks-such  as  pattern 
classification,  function  r^proximation  and  generalization  from  examples,  content-addressable 
information  retrieval,  error  correction,  optimization,  adaptation,  and  learning.  These  abilities  of 
neural  networks  might  provide  several  potential  advantages  over  the  conventional  control-design 
methods  of  FCS  systems: 

•  Neural  networics  can  approximate  nonlinear  smooth  ms^pings  arbitrarily  closely,  and  this 
might  provide  accurate  models  and  nonlinear  controllers  that  can  achieve  superior 
p^ormance  and  maneuverability  over  a  wide  range  of  flight  regimes. 

•  Neural  netwoiks  offer  not  only  a  rapidly  adaptable  but  also  an  on-line  learning  solution. 
Hence,  neural  networks  offer  benefits  beyond  those  of  adaptive  control  techniques.  R^id 
adaptation  capability  has  a  significant  value  in  a  high-performance  operational  environment 
with  aircraft  configurations,  stores  and  missions  that  change  constantly,  while  the  learning 
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capability  is  crucial  for  robustness  under  hardware  failures  and  control  suiface/body 
damage. 

•  Optimizing  neural  networks,  such  as  Hopfield-like  recunem  networks,  might  give  real-time 
adaptive  solutions  to  many  of  the  control  design  problems.  Many  of  these  problems  (e.g., 
pole  assignment,  (^timal  control  design,  parameter  estimation  and  state  observer  design)  can 
be  formulated  as  optimization  of  suitable  objective  functions.  Optimizing  neural  networks 
can  provide  a  cheap  real-time  on-board  solution  to  these  optimization  problems.  This  will 
reduce  the  need  for  predesigned  control  and  increase  ad^tability  to  both  changes  in  mission 
requirements  and  changes  in  dynamics,  such  as  those  that  might  result  from  battle  damage. 

•  Neural  networks  can  improve  productivity  in  the  off-line  design  of  FCSs.  ANN’s  can 
automate  every  stage  of  the  design  process  that  involves  trial-and-error.  For  example,  an 
ANN  might  be  trained  to  identify  the  quality  of  a  particular  design,  based  on  some 
parameters.  This  networit  can  then  find  the  controller  parameters  that  yield  a  good  FCS 
design. 

•  ANNs  can  improve  implementation  efficiency  on  the  emerging  neural  computers.  Given  the 
availability  of  relatively  inexpensive  VLSI  designs  for  neural  network  structures,  FCS 
designs  which  could  not  be  implemented  in  real-time  on  serial  machines  can  now  be 
considered  for  practical  applications. 

1.1  Applicatioiis  of  Neural  Networks  in  Control 

There  are  many  different  ways  that  neural  network  techniques  might  help  solve  control 
problems.  Some  of  these  techniques  ate  summarized  by  Atkeson  (1991).  A  recent  survey  for  the 
application  of  neural  networks  in  control  is  given  by  Hunt,  Sbarbaro,  Zbikowski,  et  al..  Hunt, 
Sbarbaro,  Zbikowski,  et  al.  (1992).  Over  the  past  few  years,  there  have  been  many  attempts  at 
applying  neural  networks  in  control,  using  different  control  and  network  architectures,  with 
varying  degrees  of  success.  Among  the  recent  papers  that  have  examined  the  application  of  neural 
networks  in  control  are  Psaltis,  Sideris  and  Yamamura,  Psaltis,  Sideris  and  Yamamura  (1988); 
Marzuld  and  Omatu,  Marzuki  and  Qmatu  (1992);  Narendra  and  Mukhopadhyay  (1992);  Schiffman 
and  Geffers,  Schiffman  and  Geffers  (1993);  Chen  and  Khalil,  Chen  and  Khalil  (1992);  Pao, 
Phillips  and  Sobajic,  Pao,  Phillips  and  Sobajic  (1992);  Levin  and  Narendra  (1993);  Kuschewski, 
Hui  and  Zak,  Kuschewski,  Hui  and  Zak  (1993);  Sanner  and  Slotine  (1992);  Jordan  and  Rumelhart 
(1992);  Atkeson,  (1991),  and  the  papers  cited  therein.  We  will  summarize  in  this  section  some  of 
the  proposed  neural-network  2q)plications  in  control. 
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An  inverse  model  is  the  most  direct  way  for  using  a  neural  network  as  a  controller.  The 
inverse  neural  network  model  is  simply  an  associative  network,  having  as  its  input  the  desired 
output(s)  x<|  and  the  current  states  of  the  system  x,  and  as  its  output  the  control  action(s)  that 
should  be  applied.  An  inverse  model  of  the  dynamics  can  be  used  in  a  variety  of  configuratioiis  in  a 
control  system.  These  different  configurations  will  be  discussed  in  a  later  chapter. 

Another  possible  application  for  neural  networks  in  control  is  to  encode  the  outcome  of  a 
control  action  as  a  function  of  the  states  and  controls.  This  has  been  called  forward  modeling. 
Unlike  inverse  models,  forward  models,  by  definition,  always  exist  for  deterministic  systems.  In 
the  case  of  ftMrward  models,  finding  the  control  command  to  use  in  order  to  teach  a  certain  desired 
state  from  a  given  state  is  not  as  straightforward  as  it  is  in  the  case  of  the  inverse  model.  Root 
finding  or  optimization  techniques  may  be  used  to  solve  for  the  control  commands  as  a  function  of 
the  states  and  desired  change  in  the  states  Atkeson  (1991).  Hoskins,  Hwang  and  Vagners, 
Hoskins,  Hwang  and  Vagners  (1992)  use  back  propagation  techniques  for  the  iterative  inversion 
of  the  forward  model,  and  they  propose  its  iqiplication  in  adaptive  control.  Forward  models  serve 
as  predictors  of  the  behavior  of  die  dynamic  systems  they  model  and  can  optimize  controllers. 

There  might  be  more  than  one  neural  network  in  a  single  control  system.  For  example, 
forward  models  can  be  used  for  a  predictor  model  and  an  inverse  model  for  a  controller.  The  role 
of  the  plant-dynamics  forward  model  is  to  predict  tbe  response  of  the  system  and  to  propagate  back 
the  error  in  the  output  (Jordan,  1989).  More  generally,  it  can  propagate  back  any  performance 
gradient  in  order  to  adjust  the  controller  parameters.  An  inverse  model  might  make  an  initial  guess 
for  the  root  finding  algorithm  for  the  forward  model  and  correct  the  errors  in  approximating  the 
forward  model. 

Neural  networks  used  as  pattern  classifiers  might  also  have  applications  in  control.  The 
classifier  acts  as  a  nonlinear  switch  between  a  discrete  set  of  controllers  that  is  based  on 
measurements  of  the  states  and  desired  outputs  of  the  system  and  the  desired  mission.  The  switch 
can  also  change  smoothly  from  one  controller  to  the  next  This  can  be  thought  of  as  a  more 
generalized  method  of  gain  scheduling.  The  controllers  themselves  need  not  be  fixed  and  can  be 
some  other  function  approximators  or  neural  networks.  This  type  of  use  of  neural  network 
classifiers  in  control  may  be  useful,  for  example,  when  the  controller  function  varies  considerably 
in  the  different  areas  of  the  state  space  or  depends  heavily  on  the  desired  mission.  This  is  one 
possible  application  for  the  competitive  networks  paradigm  described  by  Jact^bs  et  al.  (Jacobs, 
Jordan,  Nowlan,  et  al.  1991;  Jacobs  and  Jordan  1993).  Narendra  and  Mukhopadhyay,  (1992) 
have  also  suggested  the  use  of  neural  network  classifiers  as  a  switch  to  select  a  controller  for  the 
case  when  it  is  known  a  priori  that  the  controlled  plant  can  only  be  in  one  of  a  finite  number  of 
configurations. 
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Neural  networks  can  be  state  observers  for  state  feedback  control  when  not  all  the  states 
can  be  measured.  Recurrent  neural  networks  have  been  previously  proposed  and  tested  for  such 
tasks.  The  states  encoded  with  such  networks  may  not  necessarily  have  a  physical  meaning,  but 
can  be  a  nonlinear  function  of  meaningful  state  variables. 

Another  potential  use  of  neural  networks  in  control  is  to  build  a  model  of  some  measure  of 
performance  of  the  system  as  a  function  of  some  parameters  and  then  use  this  model  to  find  the 
parameters  that  optimize  this  measure  of  performance,  using  nonlinear  optimization  techniques. 

Recurrent  neural  networks  of  the  Hopfield  type  might  serve  as  chesq)  real-time  computing 
elements.  Variants  of  such  networks  have  been  proposed  for  continuous  and  combinatorial 
optimization  and  linear  algebra  problems  such  as  computation  of  matrix  inverses,  the  solution  of 
general  matrix  equations,  the  estimation  of  eigenvalues  and  singular  value  decomposition  (Cichocki 
and  Unbehauen,  1992). 

1.2  Application  of  ANN  in  PCS 

Specific  applications  of  ANNs  in  PCS  include: 

•  automatic  trim  computation 

•  gain  scheduling 

•  ad^tive  and  optimal  control 

•  identification  of  nonlinear  dynamics 

•  on-line  optimization  of  handling  quality 

•  self-repairing  flight  control 

•  automatic  trajectory  guidance 

•  integrated  fire/flight  control 

Some  of  the  potential  iq)plications  of  neural  networks  and  fuzzy  logic  in  flight  control  are 
summarized  in  Steinberg  (1992).  Recently,  there  have  been  many  reports  describing  applications 
of  neural  networks  to  flight  control  problems.  For  instance,  DiGirolamo,  DiGirolamo  (1992) 
trained  a  feedforward  neural  network  to  generate  control  gain  schedules  based  on  measmements  of 
the  states.  The  method  was  successfully  sq)plied  to  the  tracking  of  the  pitch  rate  of  a  nonlinear 
longitudinal  F/A-18  model.  Similarly,  Sadeghi,  Tascillo,  Simons,  et  al,  (1992)  trained  a 
feedforward  network  as  a  nonlinear  feedback  controller.  Hiey  found  that  the  performance  of  the 
neural  network  was  inferior  to  self-tuning  adaptive  control  law.  Linse  (1990)  modeled  the 
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longitudinal  trim  state  of  a  commercial  transport  aircraft .  Ahmed-21aid  and  colleagues  (Ahmed- 
Zaid,  loannou,  Polycarpou,  et  al.,  1992)  modeled  the  pitch  dynamics  of  an  F-16  aircraft  using  a 
radial  basis  functions  (RBF)  neural  network.  They  then  used  the  neural  network  model  and  its 
partial  derivatives  to  control  the  pitch  dynamics.  The  neural  network  approach  was  found  to  be 
superior  than  a  simple  linear  controller.  Caglayan  and  Allen  (1990)  trained  a  multilayer  perceptron 
to  model  optimum  guidance  trajectories  for  the  aeroassisted  orbital  plane  change  scenario.  Calise, 
Kim,  Kam,  et  al.  (1992)  trained  two  neural  networks  to  model  the  inverse  dynamics  of  the  rolling 
rate.  The  neural  networks  were  able  to  generate  the  differential  deflection  of  the  tail  surfaces  as  a 
function  of  the  roll  rate  command,  the  aircraft  state  vector  and  the  rudder  deflection.  Rokhsaz  and 
Steck  (1993)  used  feedforward  neural  networks  to  model  the  nonlinear  aerodynamics  and  flight 
dynamics  of  aircrafts  under  different  conditions.  They  conclude  that  neural  networks  have  good 
generalization  properties  for  the  aerodynamic  modeling  problems,  but  do  not  perform  as  well  in 
modeling  the  flight  dynamics.  Troudet,  Garg  and  Merrill  (1993)  propose  the  use  of  a  feedforward 
neural  network  as  a  controller  for  command  tracking  and  apply  it  to  the  control  of  a  linearized 
model  of  the  longitudinal  dynamics  of  an  aircraft.  They  conclude  that  although  the  nominal 
performance  of  the  neurocontroller  is  better  than  a  standard  Hao  controller,  the  stability 
characteristics  are  poor.  The  use  of  a  Hopfield  neural  network  in  synthesizing  the  optimal  inputs 
for  a  command  tracker  FCS  has  been  demonstrated  in  Mears,  Smith,  Chandler,  et  al.  (1993). 
Application  of  neural  networks  to  failure  detection  problems  in  flight  control  has  been  reported  by 
Caglayan  and  Allen  (1990)  and  Barron,  Celluci  and  Jordan  (1990). 

Few  of  the  published  reports  show  negative  results,  and  most  of  the  above  reported  results 
show  promising  use  of  neural  networks  for  different  functions  of  FCS.  We  believe  that  a 
successful  FCS  design  should  exploit  the  abilities  of  neural  networks,  namely  the  ability  to  map 
smooth  nonlinear  functions  with  high  accuracy  given  few  observations,  fast  computation,  and 
learning  and  adaptability,  while  at  the  same  time  incorporating  domain  specific  knowledge  about 
the  particular  dynamics  and  the  existing  FCS  design  expertise  that  have  accumulated  over  the 
years. 

1.3  Goals  of  the  Current  Study 

The  objective  of  the  current  study  is  to  explore  and  evaluate  the  feasibility  of  using  different 
neural  network  architectures  for  a  self-designing  FCS,  one  which  can  continuously  optimize 
performance  and  accommodate  changing  mission  requirements  and  failures  in  hardware  and  battle 
damages.  We  will  focus  in  particular  on  the  neural  implementation  of  three  major  functions 
associated  with  FCS  design,  namely  the  modeling  of  the  inverse  dynamics  and  inverse  trim, 
parameter  estimation  and  optimal  control.  For  each  of  these  functions  we  will  develop  a  neural 
network  implementation,  and  we  will  evaluate  its  performance  using  a  high  performance  aircraft 
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simulad<m.  We  plan  to  explme  and  analyze  some  of  the  {nactical  problems  associated  widi  die 
implementation  of  each  of  these  neural  aetwotk.  modules  for  PCS,  and  to  propose  possible 
alternative  solutions. 

1.4  A  Neural  FCS  for  Longitudinal  Dynamics 

The  proposed  self-designing  FCS  architecture  that  we  will  explore  is  as  shown  in 
figure  1.4-1. 


Figure  1.4-1:  Proposed  Neural  FCS  Controller  Architecture 


The  inverse  model  computes  the  control  commands  as  a  function  of  the  desired  state 
trajectory,  generated  by  the  pilot  and  the  handling  qualities  model,  and  of  the  measured  current 
state  vector.  The  optimal  control  networic  computes  an  optimal  trajectory  perturbation  based  on 
measurements  of  the  current  state  vector  of  the  aircraft,  an  objective  function,  and  the  estimated 
parameters  supplied  by  the  parameter  estimation  module.  The  parameter  ID  network  provides  an 
ftsrimaie  of  die  jacobian  of  the  dynamic  response  of  the  system  with  respect  to  the  state  and  control 
vectors,  using  as  input  the  previous  histcnies  of  the  control  and  state  trajectories. 

It  is  important  to  emphasize  tere  that  the  proposed  FCS  design,  shown  in  figure  1.4-1,  is 
chosen  only  to  show  the  power  of  neural  networics  in  PCS  design.  No  attempt  is  made  to  optimize 
the  design  with  respect  to  the  issues  of  robustness  to  noise  or  sensitivity  to  modeling  inaccuracies. 
Other  feedback-control  loops  might  improve  stability  and  robustness  in  the  current  neural 
controller.  For  instance,  the  inverse  model  can  be  used  to  linearize  the  aircraft  nonlinear  dynamics. 
A  robust  optimal  linear  feedback  controller  is  then  used  for  noise  rejection  and  increased  stability  of 
the  resultant  linear  system. 
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A  nonlinear  model  of  the  longitudinal  dynamics  of  an  F/A-18  aircraft  is  used  to  test  the 
developed  neural  network  controllers.  The  state  vector  used  in  the  simulations  reported  here  is 
composed  of  five  state  variables:  the  angle  of  attack  ( a ),  the  pitch  angle  ( 6 ),  the  pitch  rate  ( q ) 
the  total  speed  ( v^ )  and  the  altitude  ( h ).  Two  control  effectors  ate  used  in  the  simulation,  the 
engine  thrust  ( Snir )  and  the  average  elevator  deflection  ( ). 

1.5  Achieved  Objectives 

•  We  propose  and  develop  different  optimizing  recurrent  neural  networic  architectures  which 
satisfy  the  dynamic  cmistraints  for  the  optimal  control  module.  We  analyze  and  compare  the 
performance  of  the  different  architectures  with  respect  to  ease  of  implementation,  accuracy, 
robusmess  and  speed  of  computatioiL  We  discuss  how  to  implement  limits  on  the  states 
and/or  controls  and  how  to  extend  these  neural  networic  architectures  to  find  the  optimal 
control  of  nonlinear  systems. 

•  We  propose  and  develop  a  parameter  estimation  neural  network  which  can  implement 
optimization-based  parameter  estimation  algorithms.  We  discuss  methods  for  implementing 
forgetting  factors  and  relationships  between  parameters.  We  also  propose  a  feedforward 
neural  network  for  estimating  the  Jacobian  of  the  nonlinear  dynamic  equations  (stability  and 
control  derivatives)  based  on  measurement  of  the  current  state  and  control  vectors  of  the 
aircraft.  We  propose  a  parameter  ID  module  which  uses  the  optimization-based  neural 
network  in  parallel  with  a  feedforward  neural  network  to  reduce  the  need  for  dithering 
signals  for  parameter  identification  and  help  identify  failures.  We  also  discuss  efficient 
Hopfield-like  neural  network  architectures  for  implementing  state  space  parameter 
identification  techniques,  that  are  based  on  subspace  methods  (Moonen,  DeMoor, 
Vandenb^ghe,  et  al.  1992,  Swindelhurst,  Roy,  Ottersten,  et  al,  1992). 

•  We  implement  and  test  a  radial  basis  functions  neural  network  (RBF)  which  models  the 
inverse  flight  dynamics.  The  network  is  found  to  perform  well  on  training  and  test  sets  as 
well  as  in  tracking  simulated  trajectories.  In  addition,  we  train  an  RBF  neural  network  to 
estimate  the  trim  controls  and  angle  of  attack,  given  desired  aircraft  altitude  and  speed.  We 
discuss  methods  of  training  such  a  network  on-line  and  incorpomting  the  inverse  dynamics 
in  the  control  loop. 
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2 .  THE  OPTIMAL  CONTROL  MODULE 

The  role  of  the  optimal  control  modute  is  to  find  the  control  inputs  ixdiich  achieve  a  desired 
goal  optimally.  The  desired  goal  is  fcnmulated  as  an  objective  functitmal  to  minimize.  For  the  PCS 
design  proposed  in  this  report  and  shown  in  figure  1.3-1,  the  goal  of  the  optimal  controller  is  only 
to  optimize  the  control  command  perturbations  (Su)  around  the  current  operating  point  The  ouQ)ut 
of  the  inverse  model  network  determines  tire  operating-point  control  conunand  (uc)  itself.  Since  die 
state  and  control  perturbations  ( 8x  and  8a )  around  the  operating  point  are  assumed  to  be  small, 
linearized  models  can  be  used.  The  parameter  identification  module,  described  in  the  next  chsqpter, 
will  provide  the  linearized  system  parameters  that  will  be  used  by  the  c^timal  controller. 

Current  linear  optimal  controller  design  involves  finding  the  optimal  feedback  steady-state 
gains  by  solving  an  algebraic  Riccati  equation,  which  is  a  function  of  the  system  parameters  and 
the  objective  functional  weights.  It  must  be  noted  here  that  the  derived  feedback  gains  are  only 
optimal  at  steady  state.  For  objective  functions  that  have  a  small  time  horizon  compared  to  the 
system  dynamics,  or  are  explicit  functions  of  time,  the  optimal  feedback  gains  are,  in  general,  time 
varying.  For  nonlinear  systems,  many  constant  linear  feedback  gains  are  derived  at  different 
operating  points,  then  gain  scheduling  is  used  to  interpolate  the  value  of  the  feedback  gains  to  be 
used  at  the  current  operating  point 

In  the  current  work,  we  study  a  different  approach  for  implementing  the  optimal  controller. 
Instead  of  providing  the  closed-loop  optimal  feedback  gains,  the  optimal  controller  module 
computes  the  optimal  open-loop  trajectory,  based  on  the  current  state  of  the  system  and  the  current 
objective  function.  The  optimal  control  trajectory  is  recomputed  at  each  instant  of  time.  Since  the 
optimal  trajectory  generated  always  depends  cm  the  current  state  vector,  this  approach,  in  essence, 
is  equivalent  to  a  closed-loop  optimal  controller.  However,  this  approach  differs  from  a  closed- 
loop  conuoller  in  that  the  equivalent  feedback  gains  here  are,  in  general  ,time  varying  in  the  case  of 
a  limited  horizon  objective  function,  even  whoi  the  system  dynamics  are  linear  and  time  invariant 

In  order  to  achieve  closed-loop  performance,  the  open-loop  controller  should  be  able  to 
observe  the  current  state  of  the  system  and  compute  the  optimal  trajectory  in  real-time.  Practically, 
this  is  a  difficult  task  due  to  the  amount  of  computation  involved  in  finding  the  optimal  trajectory. 
Recently,  few  researchers  have  proposed  using  optimizing  neural  networks,  such  as  Hopfield 
networks,  to  find  the  optimal  trajectory  for  optimal  control  problems  of  linear  systems  (Lan  and 
Chand,  1990;  Meats,  et  al.,  1993).  In  the  current  study,  we  extend  the  work  of  previous 
researchers  and  discuss  different  novel  approaches  for  mapping  optimal  control  problems  into 
Hopfield-like  neural  networirs.  We  explore  the  advantages  and  disadvantages  of  the  different 
techniques.  We  test  the  proposed  techniques  on  a  linearized  F/A-18  dynamics,  and  we  compare  the 
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peifonnance  of  the  Hopfieid  controllers  with  the  performance  of  an  optimal  controller  derived  by 
solving  die  necessary  conditions  of  optimality. 

2.1  FunctioB  Optimization  Using  Hopfleld  Networks 

Hopfieid  networks  with  continuous  valued  units  belong  to  the  class  of  recurrent  neural 
networks.  A  major  difference  between  this  class  of  networks  and  feedforward  neural  networks  is 
the  presence  of  dynamics.  The  architecture  of  a  Hopfieid  networic  is  shown  schematically  in  figure 
2.1-1. 


Figure  2.1-1:  A  Schematic  Representation  of  a  Hopfieid  network 


The  input  to  each  unit  is  a  weighted  summation  of  the  output  of  all  the  other  units  as 
represented  by  equation  2.1-1.  The  ouqiut  of  each  unit  is  a  monotonically  increasing  function  of 
die  input,  called  the  activation  fimction. 

vi  =  g(ui)  =  g(  Jwij  vj)  (2.1-1) 

J 

where  Uj  and  Vi  are  the  input  and  ouqiut  of  unit  i  respectively  and  g( .  )  is  the  activation  fimction. 
The  connection  weights  wij  form  a  symmetric  connectivity  matrix.  Three  different  methods  have 
been  suggested  for  finding  the  output  of  the  different  units  at  steady  state:  asynchronous, 
synchronous  and  continuous.  In  the  asynchronous  update,  each  unit  is  updated  randomly  and 
independently  from  all  other  units.  In  the  synchronous  update,  the  units  are  updated 
simultaneously  at  each  clock  cycle.  In  the  continuous  update,  the  outyut  of  each  unit  is  represented 
by  a  first-order  nonlinear  differential  equation  of  the  form  of  equation  2. 
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•  -vi  +  gdH)  (2.1-2) 

In  this  study,  we  are  only  interested  in  the  continuous  update,  since  it  is  easy  to  implemoit 
it  using  analog  hardware.  It  has  been  shown,  using  Lyapunov  stability  theorems,  that  for  a 
symmetric  connectivity  matrix  with  zero  diagonal  elements,  and  using  a  monotonically  increasing 
function,  the  set  of  equations  (2. 1-2)  represents  a  stable  system. 

To  use  a  Hopfield  netwoik  for  the  minimization  of  a  multivariable  cost  fimcticxi  we  follow 
the  following  steps  (Hertz,  Krogh  and  Palmer,  1991): 

1 .  Define  an  energy  functicm  H(  v )  which  is  bounded  from  below.  The  minimum  of  the  energy 
function  should  corre^nd  to  the  minimum  of  the  cost  function  to  be  optimized. 

2.  Use  vj  =  g(  uj ),  where  g( .  )  is  a  monotonically  increasing  function. 

3  Use  the  following  update  equation: 

4.  The  connectivity  matrix  of  the  network  is  determined  by  the  function  dH(v)/dvi.  For 
example,  if  the  energy  function  H(  v  )  has  a  quadratic  form: 

H(v)  =  vTWv  (2.1-4) 

then  the  connectivity  matrix  is  equal  to  W. 

The  system  of  differential  equations  (Zl-3)  will  converge  to  a  local  minimum  of  the  energy 
function.  In  the  case  of  a  convex  energy  function,  such  as  is  the  case  with  quadratic  objective 
functions,  the  equations  will  converge  to  the  global  minimum.  For  complex  nonlinear  objective 
functions,  it  is  possible  to  improve  the  quality  of  the  solution  by  using  techniques  such  as  mean 
field  annealing. 

2.2  Optimal  Control  as  a  Static  Optimization  Problem 

We  will  restrict  our  analysis  here  to  discrete  dynamic  systems.  Continuous  dynamic 
systems  can  be  approximated  with  discrete  systems.  A  discrete  linear-quadratic  optimal  control 
problem  described  by  die  quadratic  objective  function  represented  by  equation  (2.2-1) 

N-1 

J(x,n)  =  xT(N)  Qnx(N)  +  2>’*'(k)Qkx(k)  +  uT(k)Rku(k)  (2.2-1) 

k=0 

and  the  dynamic  system  equations  (2.2-2) 
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x(k+l) a A(k) *(k)  +  B(k) o(k)  k a 0. .... N -  1  (2.2-2) 

*(0)  *  Xo 

where  z  €  and  u  e  91°^,  can  be  formulated  as  a  static  constrained  optimization  problem  as 
diown  in  equation  (2.2-3)  and  (2.2-4). 

iqpJ(Y)  (2.2-3) 

subject  to : 

M^vac  (2.2-4) 

where: 

x(l) 

*(2) 

*(N) 

"  u(0) 

«(1)  (2.2-5) 

u(N*- 1) 


is  the  N  (n  +  m)  vecttz  of  state  and  control  variables. 


is  an  N  (n  -»•  m)  X  N  (n  +  m)  block  diagonal  matrix. 


(2.2-6) 


(2.2-7) 
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dqioids  on  the  initial  and  boundary  conditions. 

2.3  Hopfidd  Networks  for  Optimal  Control 


We  will  discuss  in  this  secti<m  different  possible  implementations  of  the  optimal  controller 
using  Hopfield-like  networks.  This  work  represents  an  extension  to  the  Hopfield  optimal 
controllers  proposed  previously  by  Lan  mid  Chand  (1990);  and  Meats,  et  al.  (1993),  which  is 
based  on  the  penalty  function  methods  for  constrained  optimization.  Using  computer  simulations, 
we  will  compare  the  following  duee  implementations : 

•  The  penalty  method 


•  The  gradient  projection  algoridun 

•  The  Lagrange  multipliers  mediods 


*0 


Estimated 
System  Dynamics 
(A,  B  matrices) 


Specifications 


•  Desired  trajectory 

•  Q,  R  matrices 


x(+) 

u(t) 

X(t) 


•  Limits  on  states  and/or  controls 


•  Time  horizon 


Figure  2.3-1:  Optimal  Controller 

A  diagram  of  a  gmieral  Hopfield  optimal  controllor  is  shown  in  figure  2.3-1.  The  netwodc 
uses  the  estimated  system  dynamics  and  the  measured  current  state  of  the  system  to  compute  the 
optimal  trajectories  x(t)  and  a(t)  with  reflect  to  the  given  specifications.  Dqpending  (m  the  method 
used,  die  costate  trajectories  X(t)  may  also  be  computed  as  a  byproduct  of  the  optimizing  network. 

2.3.1  Optimal  Control  using  Penalty  Functions 

Lan  and  Chand  (1990)  and  Mears,  et  al.  (1993)  have  proposed  transforming  the 
constrained  optimization  problem  (equations  2.2-3  and  2.2-4)  into  an  unconstrained  problem,  by 
appending  the  constraints  to  the  objective  function  as  an  additional  penalty  term,  as  shown  in 
equation  (2.3. 1-1) : 
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r(v)*vTHv  +  KllMTv-cl|2  (2.3.1.1) 

where  K  »  0  is  die  penalty  coefficient  Equation  (23.1-1)  defines  an  eno-gy  function  which  can 
then  be  minimized  with  reflect  to  v  using  equation  ^.1-2) 

F(dv,dt)*-  (23.1-2) 


Equation  (2.3. 1-2)  can  be  implemented  using  a  recunent  Hopfield-like  neural  network  with 
N(n*Mn)  iMXicessing  units,  iqnesenting  the  ccunponents  of  the  vector  v.  Stability  is  easy  to  prove, 
since  equation  (2.3.1-2)  rqnesents  a  gradient  descent  equation  to  an  energy  function  which  has  a 
lower  bound.  Since  the  energy  function  I*  is  a  convex  function,  convergence  to  the  global 
minimum  of  J*  is  guaranteed.  However,  it  must  be  kq[>t  in  mind  that  the  global  minimum  of  J’  is 
only  an  aiq;>roxiination  to  the  correct  global  minimum  of  tibe  objective  function  J.  The  connectivity 


matrix  and  bias  inputs  for  the  different  processing  units  can  be  derived  from  the  Jacobian  matrix 
^  by  expanding  equation  (2.3. 1-2),  as  shown  below: 


dv 

^  =  - Wv  +  Mc 


(23.1-3) 


where  W  s  H  -f  K  M  represents  the  connectivity  matrix  and  M  c  represents  a  bias  input 

There  ate  some  disadvantages  to  the  above  implmentation  of  optimal  control.  These 
disadvantages  ate  mainly  due  to  the  penalty  function  implementation  of  the  constraints.  In  order  not 
to  violate  the  dynamic  constraints,  the  penalty  coefficients  should  be  very  high.  This  makes  the 
optimization  problem  ill-conditioned  and  consequently  results  in  very  slow  convergence  to  the 
optimal  point  Reducing  the  penalty  coefficient  on  the  other  hand  may  result  in  completely 
erroneous  results,  due  to  the  possible  violation  of  the  dynamic  constraints.  A  partial  solution  to  tiiis 
problem  is  to  use  a  time  varying  penalty  coefficient  The  value  of  the  penalty  coefficient  is 
gradually  increased,  as  the  solution  ^proaches  die  c^timal  point 

Although  tiie  penalty  function  technique  may  work  well  for  constrained  optimization 
problems  with  few  constraints,  it  does  not  yield  good  results  when  the  number  of  constraints  is 
very  large,  as  is  die  case  of  optimal  control  with  a  long  time  horizon. 

2.3.1. 1  Flight  Control  Example 


To  illustrare  die  problems  associated  widi  the  penalty  method  in  finding  the  optimal  control 
trajectories,  we  tested  its  performance  on  an  aircraft  optimal  control  problem.  The  aircraft  model 
used  is  a  5^  order  linearized  model,  obtained  by  trimming  a  nonlinear  model  of  the  longitudinal 
dynamics  of  an  F/A-18  aircraft  at  an  altitude  of 40,000  ft.  and  a  speed  of  7(X)  ft/sec.  The  states  are 
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the  pertmbatioiis  in  speed,  angle  of  attack,  pitch  angle,  pitch  rate  and  altitude.  The  control  variables 
are  the  perturbations  in  the  stabilator  angle  (Se)  and  thrust  (Snu).  The  goal  is  to  increase  die  altitude 
by  1000  ft  in  5  sec.,  widi  minimum  petturbadmis  in  the  other  aircraft  states.  A  ramp  function  widi 
sl(^  of  200  ft  /  sec  is  used  to  represent  the  change  in  altitude.  Urn  aircraft  dynamics  were  sampled 
every  0.1  sec.  This  results  in  a  Hopfield  network  with  350  units.  Two  different  simulations  ate 
shown  in  figure  2.3. 1.1-1  and  2.3.1. 1-2  for  a  penalty  coefficient  equal  to  10^  and  10^^ 
respectively.  Shown  in  the  figures  ate  the  steady  state  values  achieved  by  the  networks  (solid  lines) 
compared  to  the  exact  optimal  trajectories  (dotted  lines).  As  shown  in  the  figures,  even  with  a 
pmialty  as  high  as  10^^  there  is  a  big  discrepancy  between  the  exact  optimal  trajectories  and  those 
achieved  by  the  netwcxks.  It  is  also  important  to  note  that  at  a  value  of  the  penalty  equal  to  lO^^  the 
network  convergence  is  extremely  slow  and  may  not  be  useful  for  real-time  applications. 
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Flgiire  23.1.1-1:  Optimal  Trajectories  Uring  A  Penalty  Ropfield  Network 

With  K  s  10> 
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Figure  2.3.1.1-2:  Optimal  Tri^ectories  Using  A  Penalty  Hopfield  Network 

With  K  »  IQl* 


2.4  Constraint-Satisfying  Hopfield  Networks 

In  this  section,  we  describe  a  class  of  Hopfield-type  networks  which  converge  to  an 
equilibrium  point  diat  satisfies  the  linear  constraints  that  represent  the  dynamic  system  (2.2-4) 
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exactly.  These  networks  are  based  on  well-known  constrained  optimization  techniques. 
Ctmvergence  to  the  global  optimum  and  stability  can  be  proven  for  the  case  of  linear  systems  with 
quadratic  costs.  In  diis  chtqtter,  we  will  piesent  two  n^oiks,  tme  based  on  the  gradient  {Mojection 
algoridun  and  the  second  based  on  the  Lagrange  multipliers  methods. 

2.4.1:  Gradient  Projection  Hopfleid  Network 


One  possible  exact  num^cal  solution  to  the  optimal  control  problem  is  through  the  use  of 
gradient  projection  (Kirk,  1970).  The  idea  behind  this  approach  is  to  project  the  gradient  descent 
equation  of  the  objective  function  into  the  hypersurface  representing  the  dynamic  constraints,  as 
summarized  below. 

Define  the  {wojection  matrix 

P  =  I  -  M  (MTM)-I  MT  (2.4. 1-1) 


The  projection  matrix  projects  any  change  in  the  vector  v  to  the  hypersurface  of  the 
constraints,  provided  that  we  start  from  a  valid  solution  diat  satisfies  the  constraints.  The  projecticm 
matrix  is  synunetric,  idempotent  and  positive  semidefinite.  If  we  then  use  the  update  rule  (2.4.1- 
2),  it  is  possible  to  prove  that  the  equations  will  converge  to  the  globally  optimal  solution,  given  an 
initial  feasible  trajectory. 


dv 

dt* 


(2.4.1-2) 


which  can  be  written  as 


Wv 


(2.4.1-3) 


where  W  ^  p  H  defines  the  cormectivity  matrix.  The  value  of  the  coefficient  E  and  the 
connectivity  matrix  W  determine  the  rate  of  convergence  to  the  optimal  trajectory. 

2.4.1. 1  Equivalence  Between  Gradient  ProjectitHi  and  the  Penalty  Method 

It  is  important  to  note  here  diat  the  gradient  projection  method  can  be  viewed  as  a  modified 
penalty  method  with  an  infinite  penalty  coefficient,  hence  an  exact  solution,  but  at  the  same  time 
possesses  a  relatively  well-conditioned  connectivity  matrix.  This  result  can  be  shown  by  changing 
the  update  rule  (2.3. 1-3)  for  the  penalty  method,  to  make  it  well-conditioned  without  changing  the 
equilibrium  point  This  can  be  easily  done  by  multiplying  die  right  hand  side  of  equation  (2.3.1-3) 
by  a  ntmsingular  square  matrix  F  to  obtain  the  following  equation : 

^  =  -FWv+FMc  (2.4.1. 1-1) 
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Since  the  matrix  F,  which  has  the  same  dimension  as  W,  is  chosen  to  be  nonsingular,  it 
does  not  alter  the  equilibrium  points  of  the  original  dynamic  equations.  It  is  mly  added  to  improve 
the  eigenvalue  ratio  of  the  linear  system  of  equations  (2.3. 1-3).  Since  the  ill-conditioning  of  the 
connectivity  matrix  is  mainly  due  to  the  penalty  coefficient,  a  simple  way  to  reduce  this  ill- 
conditioning  is  by  approximately  canceling  the  effect  of  K.  One  possible  choice  of  F  for 
performing  this  cancellation  is  given  by  equation  2.4.1.1-2 : 

F  =  [E  +  KMTM]-1  (2.4.1. 1-2) 

where  E  is  a  symmetric  positive  definite  matrix.  The  best  value  of  E  is  of  course  H,  which  will 
result  in  all  eigenvalues  being  equal  Another  possibility  is  to  use: 

E=I  (2.4.1. 1-3) 

where  I  is  the  identity  matrix.  Using  the  matrix  inversion  lemma  (Bertsekas,  1982)  and  taking  the 
limit  as  K  -4  oo,  the  matrix  F  reduces  to: 

F  =  I  -  M(MTM)-1  MT  (2.4. 1.1 -4) 

which  is  exactly  the  same  as  the  projection  matrix  P  defined  above. 

Although  the  gradient  projection  algorithm  gives  the  exact  optimal  trajectory  and  has  much 
better  convergence  properties  than  the  penalty  methods,  it  has  some  important  disadvantages.  One 
disadvantage  is  that,  unlike  the  penalty  method  which  results  in  a  sparse  connectivity  matrix  W,  the 
connectivity  matrix  that  results  from  the  gradient  projection  algorithm  is  not  sparse.  This  may 
represent  a  problem  for  parallel  implementation.  Another  disadvantage  is  that  the  gradient 
projection  network  should  always  be  initialized  to  a  valid  solution.  Any  initial  error  in  the 
initialization  will  result  in  an  error  in  the  final  solution.  In  addition,  many  of  the  eigenvalues  of  the 
projection  matrix  W  will  be  zero,  due  to  the  multiplication  with  the  projection  matrix  P.  This 
makes  the  networic  less  robust  to  imptecisions  in  the  connectivity  matrix  values  and  requires  exact 
implementation  of  the  connectivity  matrix.  The  high  precision  and  the  dense  connectivity  makes  the 
gradient  projection  network  less  attractive  for  hardware  implementation. 

2.4. 1 .2  Flight  Control  Example 

We  tested  the  gradient  projection  network  on  the  same  aircraft  optimal  control  {xobtero  used 
to  test  the  penalty  method.  The  gradient  projection  network  size  is  the  same  as  the  poialty  network, 
although  much  more  dense.  The  optimal  trajectories  obtained  using  the  gradient  projection  matrix 
are  shown  in  figure  2.4. 1.2-1  (solid  lines).  As  predicted  the  network  gives  an  exact  solution.  The 
results  shown  are  after  10^  sec  of  simulation,  using  a  time  constant  equal  to  10^  sec. 
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Figure  2.4.1.2-1:  Optimal  Trajectories  using  a  Gradient  Projection  Hopfleld 

Network 


2.4.2  Lagrange  Multipliers  Hopfleld  Networks 

Another  i^proach  for  moping  a  discrete  linear  quadratic  optimal  control  problem  to  a 
Hopfleld  neural  netwoik  is  through  the  use  of  Lagrange  multipliers. 
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L 


-M 

0 


(2.4.2-1) 


where  the  matrices  H,  M  and  c  have  the  same  definitions  as  before,  then  we  can  express  the 
necessary  ccmditicHis  of  optimality  (Kiric,  1970)  in  matrix  form  as  follows: 


where  X  is  the  (N  n  x  1)  vector  of  die  discretized  Lagrange  multipliers. 

The  matrix  L  is  nonsingular  if  the  optimal  trajectory  is  unique.  One  possible 
implementation  for  the  solution  of  the  linear  system  of  equations  (2.4.2-2)  using  a  Hopfield 
netwofit  is  to  construct  the  energy  function 

If  we  then  construct  the  connectivity  matrix 
Ws-LTl 

and  use  the  update  rule : 

the  dynamic  system  defined  by  equation  (2.4.2-5)  will  be  stable  and  will  converge  to  the  optimal 
state,  control  and  costate  trajectories. 

One  serious  problem  with  the  above  approach  is  that,  although  all  the  eigenvalues  of  the 
matrix  W  are  guaranteed  to  be  negative  and  re^  for  positive  definite  Q  and  R  matrices,  and  always 
stable  for  a  detectable  system,*  the  condition  number  of  the  matrix  W  is  the  square  of  the  condition 
number  of  the  matrix  L.  This  may  result  in  ill-conditioning  of  the  matrix  W  and  consequently  a 
very  slow  convergence  to  the  optimum  point 

A  better  alternative  that  works  well  for  linear  systems  with  quadratic  costs  is  to  use  the 
duality  property  of  the  Lagrange  function  (Bertsekas,  1982),  which  states  that  the  solution  of  the 
constrained  minimization  problem  is  equivalent  to  the  minimization  over  v  and  the  maximization 


(2.4.2-3) 


(2.4.2-4) 


(2.4.2-5) 


*  A  (A,  system  is  detectable  if  the  unobservable  dgenvecton  are  stabk 
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with  re^)ect  to  X  of  the  Lagrangian  function.  We  can  then  use  a  gradient  descent  (ascent)  algorithm 
to  minimize  (maximize)  the  Lagrangian  function  with  respect  to  v  ( X  ).  This  is  guaranteed  to 
converge  for  linear  systems  with  quadratic  costs,  since  the  Lagrange  function  is  convex  with 
reflect  to  v.  For  example,  we  can  choose  the  connectivity  matrix  W  to  be 


w-f-H  M 

0 


(2.4.2-6) 


In  this  case  the  condition  number  of  W  will  be  similar  to  that  of  the  matrix  L.  The 
eigenvalues  of  W  will  have  a  negative  real  part  but  they  are  complex  in  general,  since  the  matrix  W 
is  no  longer  symmetric.  A  more  complex  heuristic  update  rule  for  the  Lagrange  multipliers  has 
been  proposed  by  Barfaen,  Gulati  and  Zak  (1989).  The  update  rule  they  propose  is  harder  to 
implement  in  hardware,  and  this  study  will  not  pursue  it  further. 

Although  tl^  Hopfield  network  implementation  of  the  Lagrange  miiltipliers  method  has  a 
bigger  size  than  the  gradient  projection  network,  it  is  much  more  sparse,  which  makes  it  much 
easier  to  implement  in  hardware.  In  addition,  since  the  connectivity  matrix  W  is  derived  directly 
from  the  system  matrices,  the  computation  time  required  to  build  the  Hopfield  network 
implementing  the  Lagrange  method  is  very  small,  which  is  better  for  real-time  ad£q)tive  adjustment 
of  the  network  weights.  A  representation  of  the  different  inputs  and  connection  strengths 
associated  with  each  unit  is  shown  schematically  in  figure  2.4.2- 1.  In  this  figure,  the  circles 
represent  a  set  of  units,  fm*  example  the  state,  control  and  costate  vectors  at  time  step  k. 


X(k)  X(k  +  1)  X(k) 

Figure  2.4.2-1:  Connectivity  of  a  Lagrange  Multipliers  Recurrent  Network 


2.4.2. 1  Flight  Control  Example 

We  tested  the  Lagrange  multipliers  network  on  the  same  aircraft  optimal  control  problem 
used  in  the  previous  sections.  The  network  size  in  this  case  is  larger  than  the  previous  networks 
since  the  Lagrange  multipliers  network  contain  extra  units  for  the  costates  (Lagrange  multipliers). 
The  total  number  of  units  for  a  0.1  sec.  sampling  rate  equals  to  650  units.  However  the 
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connectivity  of  the  netwoik  is  very  sparse  as  conapaied  to  the  gradient  projection  network.  The 
optimal  trajectories  obtained  using  the  Lagrange  multipli^  network  are  shown  in  figure  2.4.2. 1-1 
(solid  lines)  together  with  the  exact  trajectory  obtained  using  the  exact  solution  of  the  necessary 
conditions  of  optimality.  As  predicted,  the  network  gives  an  almost  exact  solution.  The  results 
shown  ate  after  sec  of  simulation,  using  a  time  constant  equal  to  10^  sec. 
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Figure  2.4^11:  Optimal  Trajectories  using  a  Lagrange  Multipliers  Recurrent 

Network 

2.43  Implementation  of  Control  and  State  Limits 

In  many  control  applicmions,  there  may  be  physical  limits  on  the  values  of  the  control  or 
state  variables.  Simple  inequality  constraints  can  represent  diese,  and  can  be  implemented  easily  by 
constantly  monittmng  the  state  and  control  variables.  If  these  variables  are  at  the  specified  limits, 
and  die  rate  of  change  of  these  variables  is  in  the  direction  of  exceeding  these  limits,  then  their  rate 
of  change  is  set  to  zero,  and  diey  are  held  at  the  specified  limits.  This  is  easy  to  do  in  hardware,  by 
passing  the  controls  or  states  through  squashing  functions,  such  as  sigmoids,  which  limit  the 
outputs  of  these  controls  and  states  to  the  desired  values.  The  addition  of  the  squashing  fimetions 
will  not  alter  the  stability  of  die  network,  but  it  might  result  in  a  change  of  the  rate  of  convergence. 

If  the  linear  inequality  constraints  are  each  a  fimetion  of  more  than  one  control  and/or  state 
variables,  it  might  be  possible  to  implement  these  constraints  by  transforming  them  into  equality 
constraints  using  slack  variables. 

2.4.4  Conclusion 

In  this  chapter,  we  analyzed  three  different  methods  for  finding  the  optimal  control 
trajectories  for  linear  systems,  given  an  objective  function.  We  tested  the  different  methods  on  a 
linearized  aircraft-model  of  longitudinal  dynamics.  A  summary  of  the  properties  of  the  different 
techniques  is  shown  in  table  2.4.4- 1 .  From  this  table,  Lagrange  multiplier  method  sadfies  most  of 
the  desired  objective  criteria. 


Table  2.4.4-1:  Comparison  Between  Three  DifTerent  Recurrent  Neural  Network 

Implementations  Of  Optimal  Control 


Penalty  Methed 

Oradlent 

Pre|eetien 

Lagrange  Multipliers 

Accuracy 

Not  Exact 

Exact 

Exact 

Robustnaas 

Robust 

Not  Robust 

Robust 

Relative  Netwerk  Slie 

Small 

Small 

Large 

Netwerk  Cennectivlty 

Sperae 

Oanoa 

Sparse 

Cenvergence  Rrepertlea 

Slow 

Fast 

Fast 

W  Matrix  Cemputatlen 

SImpla  mapping  at  A,  B,  Q.  R.  K 

Matrix  Inversion 

Simple  mapping  of  A,  B,  Q,  R 
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3.  PARAMETER  IDENTIFICATION 

The  real-time  optimal  controllers,  developed  in  the  previous  cluster,  require  an  adaptive 
real-time  parameter  estimation  module.  Tlu  role  of  the  parameter  estimation  module  is  to  provide 
the  controllers  with  fast  and  accurate  estimates  of  the  linearized  system  matrices,  by  measuring  die 
input/ouqiut  history  of  the  different  state  and  control  variables  of  the  aircraft  We  can  look  at  the 
parameter  ID  problem  as  a  mapping  from  input-output  data  histories  to  unknown  parameters, 
derined  with  respect  to  a  particular  class  of  models,  for  instance  linear  state  space  systems. 
Techniques  for  parameter  estimation  typically  involve  finding  the  parameters  that  optimize  some 
objective  criteria:  such  as  minimizing  the  error  between  the  measured  and  predicted  outputs,  or 
maximizing  the  likelihood  that  the  input-output  data  result  from  the  proposed  model.  More 
recently,  there  has  been  an  increasing  intere^  in  state  space  parameter  estimation  using  subspace 
mottel  identification  techniques.  State  space  subspace  system  identification  (S4ID)  methods  use 
linear  algebra  tools,  such  as  QR  and  Singular  Value  Decomposition,  to  find  the  subspace  that  best 
fits  the  input-output  data.  If  we  have  only  input-ouqiut  data  measurements,  the  definition  of  states 
is  not  unique. 

In  this  chapter,  we  will  explore  the  use  of  different  neural  networic  architectures  for  the 
implementation  of  real-time  parameter  identification  techniques.  Neural  networks  offer  many 
desirable  features  that  might  be  useful  for  parameter  identification.  These  features  include : 

•  Fast  optimization  of  linear  and  nonlinear  objective  functions:  Hopfield-type  neural  netwoiks, 
discussed  in  the  previous  chapter,  possess  such  a  capability.  This  feature  can  be  used  to 
implement  optimization-based  system  identification  algorithms  in  real-time.  In  a  more 
indirect  way,  these  networks  are  also  csq;>able  of  implementing  S4ID  estimation  algorithms, 
by  performing  many  of  tiie  matrix  computations  involved  in  implementing  S4ID  algorithms 
such  as  computing  QR  and  Singular  Value  Decomposition. 

•  Ability  to  store  knowledge  and  previous  experience  about  die  dynamic  system.  This  feature 
may  turn  out  to  be  particularly  important  in  data-poor  environments,  where  the  states  are 
either  unavailable  for  measurements  or  highly  corrupted  with  noise.  The  availability  of 
previous  knowledge  may  reduce  the  amount  of  data  required  to  accurately  estimate  the 
parameters.  For  example,  in  normal  aircraft  operation,  a  content  addressable  memory  can  be 
trained  to  generate  the  jacobian  matrices  of  the  dynamic  system,  given  only  current 
measurements  of  the  state  and  control  vectors.  This  is  not  possible  with  conventional 
parameter  estimation  techniques,  \riiere  usually  the  history  of  the  state  and  control  variables 
are  needed  for  a  robust  estimation.  In  addition  the  control  signal  should  be  persistently 
exciting.  Of  course,  since  the  memory-based  network  bases  its  decision  only  on  the  current 
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State  and  control  variable  and  its  previous  experience  about  the  behavior  of  the  dynamic 
system  in  similar  conditions,  it  may  not  detect  a  fast  change  in  the  dynamics  of  the  system. 
Therefore,  the  memory-based  networic  can  augment  a  conventional  parameter  estimation 
technique  which  observes  bodi  state  and  control  histories  but  cannot  substitute  it  completely. 
Both  modules  can  woric  in  parallel,  with  the  conventional  parameter  estimation  technique 
providing  continuous  training  for  the  memory-based  networir. 

3.1  Previous  Neural  Network  Approaches  to  Parameter  Estimation 

There  has  been  relatively  little  published  research  on  using  neural  networks  for  the 
estimation  of  parameters  of  dynamic  systems.  This  may  be  due  to  the  success  of  conventional 
parameter  estimation  techniques  and  the  relatively  small  added  value  that  neural  networks  may 
offer.  Wang  and  Mendel  (1991)  trained  a  structured  network  to  estimate  the  parameters  of  a 
moving  average  (MA)  process  using  higher  order  statistics  of  the  outputs.  They  also  proposed 
techniques  for  extending  their  technique  to  autoregressive  moving  average  (ARMA)  processes. 
Other  researchers  have  used  limited  histories  of  input/output  data  to  train  a  feedforward  network  to 
estimate  the  parameters  of  a  dynamic  process  (Samad  and  Mathur  1991;  Foslien,  Konar  and 
Samad,  1992).  One  obvious  problem  with  this  approach  is  that  the  dimension  of  the  feedforward 
neural  network  is  very  large  for  robust  identification.  Chu,  Shoureshi  and  colleagues  proposed 
using  Hopfield  networks  for  the  identification  of  the  parameters  of  linear  systems  in  state-space 
form  (Chu,  Shoureshi  and  Healey,  1992);  Chu  and  Shoureshi,  1992;  Chu,  Shoureshi  and 
Tenorio,  1990). 

3.2  Implementation  of  Optimization-Based  Parameter  ID  Using  Hopfield 
Networks 

In  this  section,  we  describe  optimization-based  methods  for  the  estimation  of  parameters  of 
dynamic  systems  represented  in  state-space  form.  We  then  discuss  how  these  methods  can  be 
implemented  using  Hopfield  neural  networks.  We  extend  the  work  of  Chu  and  Shoureshi  to 
include  the  estimation  of  parameters  in  the  case  of  known  relationships  between  parameters.  We 
will  also  discuss  how  to  implement  a  forgetting  factor  for  fast  adaptation. 

A  Hopfield-based  parameter  identifier  is  illustrated  in  figure  3.2-1.  The  ctxuiectivity  matrix 
encodes  the  input/output  response  history  of  the  dytuunic  system,  measurement  error  covariances 
and  known  parameter  relationdups.  Initial  parameter  estimates,  if  available,  help  iiutialize  the  state 
of  the  network.  We  will  describe  now  in  detail  how  to  synthesize  the  parameter  ID  Hopfield 
network  to  satisfy  the  above  requirements. 
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•  Measurement  noise  variance 

•  Initial  parameter  estimates 

•  Known  parameter  relationships 

Figure  3.2-1:  A  Hopfidd  Network  Parameter  Identification  Module  for 

Linear  Systems 

3.2.1  Hopfleld  Parameter  ID  network 

Given  a  particular  model  structure  with  unknown  parameters,  a  parameter  ID  problem  can 
be  defined  as  finding  the  parameters  which  optimize  a  certain  objective  function  such  as  the 
minimization  of  the  error  betweoi  modd  predictions  and  measured  outputs,  or  the  maximization  of 
the  likelihood  that  the  unknown  parameters  produced  the  measured  data.  As  we  demonstrated  in 
the  previous  dusters,  recurrent  networks  may  solve  such  optimization  problems.  We  present  here 
the  derivation  of  a  Ifopfield  parameter  ID  lecurrent  network  for  linear  systems  and  discuss  its 


The  system  state  equations  for  a  disoetized  linear  system  can  be  defined  as: 
Zk+l  Axk  +  Buk 


(3.2.1-1) 


Our  objective  is  to  find  the  matrices  A  and  B  that,  given  Xk  anti  uk,  can  predict,  in  a  least  square 
sense,  die  ouq>ut  If  we  define  the  prediction  error 


eksxk^l  -Axk-Buk 

where  c  is  an  N-dimensional  vector,  and  define  an  energy  function 

E-jTV  SeWiT 


(3.2. 1-2) 


(3.2. 1-3) 


k  is  easy  to  prove  dutt  die  matrix  differential  equations : 

dA  dE 
dt  *  ■  ®  aa 


(3.2.1-4) 
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dB 

IT* 


-c 


|E 

dB 


(3.2. 1-5) 


have  an  equilibrium  point  at  the  minimum  of  the  energy  function,  therefore  the  matrices  A  and  B 
converge  to  the  optimal  least  squared  enrar  sdution. 

Substituting  for  die  energy  function  from  equation  (3.2. 1-3)  above,  the  differential 
equations  (3.2.1-4),  (3.2. 1-5)  can  be  written  as : 


dB^ 


[isr 


A^ 

X  UfcXj  X  UkUl 

B^ 

Connectiri^matrixW  BiasMaSxS 


(3.2. 1-6) 


where  N  is  the  number  of  data  points  in  the  measurement  window.  The  dimension  of  the 
connectivity  matrix  W  is  (n-Hn)  X  (n-Hn),  where  n  is  the  state  dimension  and  m  is  the  control 
dimension.  The  bias  matrix  S  is  of  dimension  (n-fm)  X  n.  The  number  of  units  in  the  Hopfield 
network  representing  the  above  system  of  equations  is  the  same  as  the  dimension  of  S. 
Monotonically  increasing  functions  may  be  added  to  the  above  system  of  equations,  but  they  are 
not  necessary.  It  is  important  to  realize  that  the  connectivity  matrix  W  is  not  constant  and  is  a 
function  of  the  correlations  between  input  and  output  data.  This  makes  it  more  difficult  to 
implement  in  hardware. 

The  above  parameter  ID  Hopfield  network  may  be  extended  in  different  ways  to  obtain 
better  estimates  of  the  parameters  as  described  below: 

A  Weighted  Least  Square  Estimation 


For  nonlinear  and  time  varying  systems,  the  system  parameters  change  with  time.  To  track 
these  time-varying  parametos,  more  recent  data  are  considered  to  be  more  relevant  and  given  more 
weight  The  weighting  parameter,  also  called  forg^ting  factor,  can  be  easily  incorporated  into  the 
connectivity  matrices  and  bias  terms  of  die  Hqifield  networit  as  shown  by  equation  (3.2. 1-7). 


w  = 

N-1  N-1 

Ss 

2  AfcUkXf 

X  AtUkuf 

X  AkUfcXj,, 

ki® 

k^ 

where  Ak  is  the  forgetting  factor,  usually  chosen  to  be  Ak  s  with  0  <  a  <  1 
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Identiilaition  Of  Parameters  Witb  Equality  Constraints 

As  pointed  out  by  Chandler.  Pachter  and  Mears  (1993)  the  exploitation  of  a  priori 
information  about  relationships  between  the  different  parameters  to  be  identified  reduces  the 
uncertainty  in  the  estimation  of  these  parameters.  As  mentioned  in  the  previous  chapter,  it  is 
possible  to  implement  linear  constraints  in  a  Hopfield  network  using  Lagrange  multipliers. 

3.2.2  Practical  Issues  In  Tlie  Lnplementation  Of  State-Space  Parameter  ID 
Using  Hopfleld  Networks 

Since  the  Hopfield  network  is  an  implementation  of  the  classical  optimization  based 
parameter  ID  mediods,  it  shares  many  of  the  practical  issues  associated  with  diese  techniques,  such 
as  the  requirement  of  persistently  exciting  inputs  and  the  ability  to  stably  track  the  parameters  of  a 
nrmlinear  or  time-vatying  system.  In  additicm,  there  are  other  problems  that  are  due  to  the  Hopfield 
implementation.  One  major  problem  is  the  possibility  of  ill-conditioning  of  the  matrix  W.  This  is 
can  be  remedied  by  scaling  the  different  input  and  state  variables,  or  equivalently,  by  using 
different  time  constants  for  the  different  parameters. 

3.2.3  Flight  Control  Example 

In  this  example,  we  used  a  Hopfield  network  to  identify  the  parameters  of  the  linearized 
longitudinal  dynamics  model  of  the  F/A-18  aircraft  at  an  altitude  of  30,0(X)  ft  and  a  speed  of  700 
ft/sec.  We  used  the  optimal  control  inputs  that  were  generated  using  the  Lagrange  multipliers 
Hopfield  network  optimal  controller.  A  window  of  1(X)  data  points  at  a  sampling  rate  of  2(Hiz  was 
used  in  the  identification.  We  added  white  noise  to  the  state  variables  with  a  magnitude  equal  to 
10%  of  their  variance.  We  used  the  parameters  estimated  using  the  Hopfield  networit  and  the 
control  inputs  to  reconstruct  the  states.  Bgure  3.2.3-!  shows  a  comparison  between  the  noisy  state 
measurements  and  the  state  urajectories  generated  using  the  estimated  parameters.  As  shown  in  die 
figure  the  Hopfield  netwmk  was  able  to  rqiroduce  the  states  with  a  high  accuracy. 

In  the  above  example,  in  order  to  make  the  Hopfield  network  work,  we  had  to  scale  the 
altitude  variable.  Without  scaling,  the  connectivity  matrix  becomes  extremely  ill-conditioned  and 
requires  a  very  small  integration  step. 

Although  Hopfield  type  neural  netwoiks  are  capable  of  performing  state  space  parameter 
identification,  as  shown  in  the  above  example,  they  may  not  have  any  advantage  over  current 
techniqi^  for  small  systems. 
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Figure  3.2.3<1:  Noisy  State  Measureinents  (Dotted  Lines)  And  Estimated  States 

Solid  Using  Hopfield  Parameter  ID 
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4.  LEARNING  THE  INVERSE  DYNAMICS 

We  investigate  in  this  chapter  learning  the  inverse  dynamics  for  use  in  computing  the 
dynamic  trim  for  aircraft  PCS  applications. 

Problem  FormulatiiMi 


The  inverse  dynamics  (tfa  plant  defined  by  the  state  equadmis  (4-1) : 

^=K*.1I)  (4-1) 


where  xE  St>*anda€  can  be  defined  as: 
u,  *^‘(x,^.Uj^i)  i=l . m 


(4-2) 


where  Uj^  is  an  vector  formed  by  excluding  the  control  variable  Ui  from  the  control  vector 
u.  In  general,  the  inverse  ^'  ( •)  may  not  exist  If  the  control  variables  represent  a  nonredundant 

system,  that  is  there  is  a  unique  control  vector  u  which  produces  a  given  desired  rate  of  change  of 
the  states  X, ,  then  the  dependence  of  each  control  variables  on  the  other  control  variables  may  be 

dtoi^)ed  and  the  inverse  equations  may  be  written  as  shown  in  equation  (4-3): 


a 


(4-3) 


However,  it  must  be  noted  here  that  the  inverse  function  g(  • )  is  not  defined  everywhere.  It 
is  only  defined  in  a  small  submanifold  F  c  The  submanifold  F  defines  the  set  of  reachable 
states  and  achievable  cates  of  change  of  die  states.  For  example,  for  a  linear  system,  the  achievable 
rates  of  change  of  the  states  fircxn  aparticular  state  are  defined  by  the  hyperplane: 

i;  =  Ax  -f  Bu  (4-4) 

For  example  if  the  system  is  a  second  order  and  the  B  matrix  =  [  0  111*  and  x  =  [0  0]'!'  then 
the  vector^  can  only  be  in  the  direction  of  B  =  [  0  1]’!'.  In  such  a  case  the  inverse  function 

» =*(*.$)  at  X  =  [0  O]’!'  and  i  s  [0  alf  is  not  defined.  If  the  number  of  control  variables  m  is 

equal  to  the  order  of  the  system  n,  the  columns  of  the  matrix  B  are  aJ  independent  and  there  are  no 
limits  on  the  control  variables  then  the  submanifold  F  =  9t^.  Obviously  these  conditions  are  too 
restrictive  and  are  sddom  met  in  practice.  Therefore,  it  is  important  to  define  the  submanifold 
where  the  inverse  is  valid.  Since  the  desired  rates  of  change  of  the  states  do  not,  in  general, 
coincide  with  the  submanifold  where  the  inverse  is  defined,  it  is  important  to  find  a  suitable 
approximation  to  the  inverse  function  g(  • )  in  the  region  where  such  an  inverse  does  not  exist  This 
can  be  done  eidier  during  the  training  of  a  neural  network  to  perform  die  inverse  or  even  during  run 
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time.  For  example,  at  run  time,  one  possible  method  to  approximate  the  inverse  outside  the 
manifold  where  it  is  defined  is  to  project  the  desired  rates  of  change  of  the  states  into  the  manifold 
r  and  then  use  the  inverse  model  There  is  another  approach  that  can  be  used  to  overcome  the 
problmn  that  the  dimensionality  of  the  rates  of  change  of  the  states  is  usually  higher  than  the 
dimensionality  of  the  available  controls:  to  define  the  inverse  function  u  =  g(x,v),  where  the 
vector  V  is  of  the  same  dimension  as  the  control  vector  u.  We  can  then  project  the  desired  rates  of 
change  of  the  states  to  the  variable  v,  using  for  example  a  gain  matrix  K  ( e.g.  v  =  K  x  ). 

Methods  for  training  neural  networks  to  perform  the  inverse  function  will  be  discussed  in 
detail  in  the  next  section. 

4.1  The  Inverse  Model  as  a  Controller 

In  geiMral,  control  system  design  can  be  viewed  as  an  attempt  to  produce  a  well  behaved 
inverse  of  the  plant  dynamics  that  has  desirable  robustness,  stability  and  dynamic  response 
characteristics.  An  exact  inverse  model  of  the  dynamics  is  not  always  desirable  if  it  does  not 
possess  good  dynamic  characteristics.  In  this  chapter  we  discuss  different  techniques  for  building 
an  adaptive  neural  network  model  of  the  inverse  dynamics.  We  develop  one  such  network  for 
modeling  the  inverse  Irmgitudinal  dynamics  of  a  nonlinear  F/A>18  simulator.  We  discuss  methods 
for  overcoming  many  of  the  shortcomings  of  inverse  dynamics  control. 

There  are  many  different  ways  to  train  a  neural  network  to  produce  the  inverse  of  a 
dynamic  system.  Three  of  these  ways  are  shown  schematically  in  figure  4.1-1.  Kawato  and  Gomi 
(1992)  summarize  the  advantages  and  disadvantages  of  each  of  these  three  techniques.  The 
simplest  possible  scheme  for  acquiring  an  inverse  is  what  is  called  the  direct  inverse  modeling 
(figure  4.1-l.a).  In  this  scheme,  the  inverse  model  observes  the  realized  trajectory  of  the  plant  and 
attempts  to  estimate  the  control  command  that  generated  that  trajectory.  The  error  between  the 
inverse  model  estimate  and  the  actual  control  command  is  used  to  train  the  inverse  model  This 
scheme  has  been  successfully  iqiplied  by  Atkeson  (1989)  to  train  a  neural  network  to  estimate  the 
torques  necessary  to  drive  a  robot  arm  along  a  desired  trajectory.  However,  this  technique  by  itself 
does  not  solve  the  problems  associated  v^th  inverse  modeling  discussed  in  the  previous  section, 
namely  that  it  will  fail  when  the  inverse  is  not  unique.  This  problem  can  be  solved  by  adding  extra 
constraints  which  make  the  network  choose  only  a  unique  inverse.  Also,  this  technique  will  fail 
when  the  desired  state  change  cannot  be  achieved,  since  the  neural  network  will  not  have  any 
training  examples  in  that  region.  One  solution  in  such  cases  might  be  to  use  the  control  values  of 
die  nearest  experiences. 

The  second  approach  to  learning  the  inverse  dynamics,  developed  and  used  by  Jordan 
(Jordan,  1989);  (Jordan  and  Rumelhart,  1992),  is  to  first  transform  errors  in  the  states  S  to  errors 
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in  c(Mitrol  fl  by  propagating  I  backward  through  a  forward  model  of  the  plant  dynamics,  as 
shown  in  figure  4.1-l.b.  This  approach  solves  the  problem  of  finding  the  control  values  when  the 
inverse  is  not  unique,  since  the  objective  function  to  be  minimized  in  this  case  is  the  output  error. 
This  is  nnlite  the  direct  inverse,  where  the  learning  objective  is  to  minimize  the  error  in  the 
estimated  control.  Moreover,  it  is  possible  to  add  a  regularization  term  to  the  baclq)ropagation 
algorithm  to  achieve  better  control  qualities.  For  example  the  objective  function  in  training  the 
inverse  model  for  a  discrete  dynamic  system  can  be  expressed  as  shown  in  equation  (4.1-1)  below: 

njjn  (FN(Xk,Uk,W)-ykfQ(FN(Xk,Uk,W)-yk)  +  Uk^Uk  (4.1-1) 

where  FN(Xk,ttk,W)  represents  the  neural  network  forward  model,  yk  is  the  desired  output  vector 
for  training  point  k,  and  xk  is  the  output  of  the  plant  resulting  from  applying  control  uk.  The 
matrices  Q  and  R  are  weighting  matrices. 
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a.  Direct  Inverse  Learning 


b.  Inverse  Learning  using  Forward  Model  Error  Backpropagation 


c.  Inverse  Learning  Using  Feedback  Error 


Figure  4.1-1.  Different  Techniques  for  Learning  die  Inverse  Model 

W  represents  the  parameters  of  the  feedforward  neural  network.  The  change  in  the  control  for 
updating  the  value  of  Uk  can  be  found  using  any  of  the  nonlinear  optimization  techniques.  For 
example  mring  gradient  descent  this  update  rule  can  be  written  as: 

flfc  » ■  Qi  +  Ru  (4.1-2) 
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The  update  rale  for  Ae  inverse  model  neural  n^woik  parameters  can  then  be  derived : 

(4.1-3) 

where  IM(zic,Xk^l,V)  represents  the  invmse  model  and  V  represents  the  inverse-model  neural 
network  parameters.  One  of  the  advantages  of  the  hackpropagation  through  the  forward  model 
technique  is  that  it  is  possible  to  make  the  inverse  more  sensitive  to  a  particular  state  through  the 
choice  of  the  Q  matrix.  For  example  if  the  measurements  of  a  particular  state  ate  very  noisy,  we 
can  reduce  the  sensitivity  of  the  inverse  with  respect  to  that  particular  state  by  reducing  the 
corresponding  values  of  the  Q  matrix.  It  is  also  possible  to  regulate  the  degree  of  utilization  of  the 
different  controls  through  a  proper  choice  of  the  matrix  R.  Of  course,  the  major  disadvantage  for 
the  hackpropagation  through  the  forward  model  technique  is  that  it  requires  building  a  forward 
model  of  the  dynamic  system  in  addition  to  the  inverse  model.  An  approach  similar  to  the 
feedforward  hackpropagation  has  been  suggested  by  few  researchers.  In  this  approach,  only  the 
forward  model  is  learned  and  adapted  to  changes  in  plant  dynamics.  Then  using  nonlinear  root¬ 
finding  techniques,  similar  to  those  used  in  aircraft  trimming,  it  may  be  possible  to  find  the 
controls  that  produce  the  desired  response.  The  obvious  disadvantage  with  this  approach  is  that  it 
requires  real-time  solution  of  coupled  nonlinear  equations,  which  may  be  computationally 
expensive. 

The  third  approach  to  learning  an  inverse  model,  shown  in  figure  4.1-l.c  and  called 
feedback  error  learning,  is  described  by  Kawato  and  Gomi  (1992).  In  this  technique,  the  ouq)ut  of 
an  error  feedback  controF  ^r  is  used  as  an  error  signal  to  train  the  inverse  model  It  is  important  to 
note  that  even  though  the  feedback  controlla-  may  be  linear,  the  neural  network  inverse  model  is,  in 
general,  non  linear  after  training.  However,  the  learned  inverse  model  still  depends  on  the  quality 
of  the  feedback  controller  and  its  preferences.  For  instance  if  the  feedback  controller  consists  of 
high  feedback  gains  corresponding  to  unstable  states  and  lower  gains  for  originally  stable  states, 
the  resultant  inverse  model  will  also  reflect  die  same  priorities.  The  same  can  also  be  said  for  the 
relative  utilization  of  the  different  control  variables. 

4.2  Learning  tfie  Inverse  Longitudinal  Trim  Function  of  die  F/A-18 

A  neural  network  can  be  trained  to  produce  the  control  surface  commands  to  trim  the 
aircraft  at  given  flight  conditions  and  aircraft  trim  states.  The  inverse  trim  network  is  a  specialized 
model  of  the  full  inverse  dynamics,  where  the  desired  rates  of  change  of  the  states  are  all  set  to 
zero.  This  reduces  the  input  space  of  the  inverse  trim  network  considerably  and  makes  it  much 
easier  to  train  with  fewer  training  examples.  Although  trimming  the  aircraft  can  be  done  using 
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conventional  numerical-trim-finding  algtxithms,  the  advantage  of  using  a  noiral  netwoik  approach 
is  die  possible  adi^itation  to  aircraft  dynamics. 

We  implemented  an  inverse  trim  neural  n^oik  of  the  longitudinal  dynamics  of  the  F/A-18 
aircraft  The  inverse  trim  neural  networic  consisted  of  two  inputs  and  three  outputs.  It  took  as 
input  the  desired  altitude  and  speed  of  the  aircraft  and  produced  as  output  the  elevator  angle  (Se), 
die  total  thrust  and  the  angle  of  attack  (a)  required  to  trim  the  aircraft 

The  first  step  to  train  the  inverse  trim  iwural  networic  was  to  generate  a  trim  database.  We 
used  a  numerical-trim-finding  algorithm  to  generate  the  database.  The  training  set  contained  200 
data  points.  Each  data  point  consisted  of  the  trim  quintuple  (speed  Vj,  altitude  hj.  angle  of  attack 
OCT*  elevator  angle  Ser  and  the  thrust  thir).  The  values  of  Vt  and  hj  were  randomly  and  uniformly 
chosen  to  be  in  the  range  of  [3(X)-9(X)  fL/sec]  and  [0  -  60,000  ft.]  respectively.  The  trim  condition 
was  for  a  level  flight,  therefore  the  pitch  angle  6  was  set  equal  to  the  angle  of  attadc  a  and  the  pitch 
rate  q  is  equal  to  zero.  Therefore  we  did  not  need  to  include  the  value  of  6  and  q  in  the  computation 
of  the  inverse  trim  model. 

The  neural  network  model  for  the  inverse  trim  consisted  of  three  HyperBF  networks  with 
Gaussian  basis  functions  of  variable  widths.  Each  HyperBF  network  computed  one  of  the  trim 
variables  aj.  Ser  and  Sout-  The  HyperBF  networic  structure  and  the  training  algorithms  used  to 
compute  the  networic  parameters  ate  presented  in  detail  in  Appendix  A.  Each  network  contained  80 
basis  functions,  with  centers  randomly  and  uniformly  distributed  in  the  range  of  the  training  set. 
For  each  network,  the  parameters  to  be  identified  consisted  of  the  80  linear  coefficients  Ci  and  2 
parameters  for  a  diagonal  weight  matrix  W  (see  Appendix  A  for  an  explanation  of  the  meaning  of 
the  parameters).  We  used  a  least  square  error  minimization  between  the  estimated  and  exact  output 
values  as  the  criteria  for  choosing  the  network  parameters.  The  relative  rms  error  defined  as: 


was  used  to  evaluate  the  performance  on  the  training  set,  where  xn  represents  the  exact  trim  value 
and  Xei  is  the  estimated  value  and  200  is  the  number  of  data  points  in  the  training  set  This  error 
was  found  to  be  less  than  0.01  for  all  the  three  variables  or.  5er  and  Shut- 

To  test  the  performance  of  the  inverse  trim  network,  we  compared  its  output  with  the  exact 
output  generated  using  a  numerical  trim  solver.  The  data  used  in  the  test  set  were  different  from 
those  used  in  training  the  inverse  trim  networks.  Figure  4.2-1  compares  the  exact  outputs  with  the 
outputs  generated  using  the  inverse  trim  networks  at  different  trim  states.  The  results  shown  are  for 
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trimming  die  aircraft  at  dififerent  altitudes  and  at  a  constant  trim  speed  vf  700  ft/sec.  As  shown 
in  the  figures,  the  estimated  trim  values  match  die  numerkaUy  computed  mies.  The  error  is  higher 
near  the  boundaries,  where  the  amount  of  training  data  are  not  sufficient  for  exact  graeralization. 


Figure  4,2-1.  Comparison  Between  Neural  Network  Inverse  Model  For  Trim 
(Solid  Line)  And  Numerically  Computed  Trim  Values  (Dotted  Line)  At  A  Constant 

Vt  *  700  Ft/Sec. 
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4.3  Learning  tiic  Inverse  Longitudinal  Dynamics  of  an  F/A*18  Aircraft 

Modding  die  inverse  trim  function  presented  in  the  previous  section  is  a  ^ecialized  versitm 
of  a  full  inverse  model.  Equation  (4.3-1)  defines  a  general  inverse  model  for  the  longitudinal 
dynamics  of  an  aiicrafL 

8e  =  gl(o,eji,v,q,o,  h,  u,  q) 

8|hr  =  g2(o.64».v,q,d,h,u,w,q)  (4.3-1) 

The  fact  diat  we  can  choose  all  the  states  and  their  rates  of  change  independently  makes  the 
input  ^ace  much  larger  than  in  the  case  of  the  inverse  trim  model.  For  the  inverse  longitudinal 
dynamics  model  of  the  F/A-18  that  we  used,  the  input  space  was  formed  of  five  states  and  their 
rates  of  change,  for  a  total  of  10  dimensions,  as  shown  in  equation  (4.3-1).  There  were  two 
ouqiuts,  the  stabilator  angle  and  the  thrust  We  used  two  HyperBF  networks  with  Gaussian  units, 
one  for  each  output  The  Gaussian  RBF  units  had  variable  widths  along  the  different  input 
dimensions.  The  HyperBF  parameters  were  estimated  using  the  heuristic  method  described  in 
Appendix  A,  in  which  the  widths  of  the  Gaussians  are  function  of  the  average  sensitivity  of  the 
inverse  model  in  the  different  directions,  in  additicai  to  the  range  of  the  different  variables. 

4.3.1  Training  the  Gaussian  HyperBF  Inverse  Model 

We  trained  the  two  HyperBF  networks  using  2(X)0  training  points  randomly  and  uniformly 
distributed  in  the  input  space.  Each  training  point  consisted  of  twelve  dimensions,  ten  for  the 
inputs  and  two  for  the  ouq)ut  of  the  inverse  model.  The  training  points  were  generated  using  the 
direct  inverse  method  described  above.  At  each  given  random  state,  we  applied  a  random  input  to 
the  aircraft  model  and  observed  the  resulting  rate  of  change  of  the  states.  We  then  used  the  states 
and  the  rates  of  change  of  the  states  as  inputs  to  the  neural  network  inverse  model  and  trained  the 
network  so  that  its  output  approximated  the  controls  that  produced  the  change  in  states.  The 
network  contained  250  Gaussian  HyperBF  functions,  with  their  centers  randomly  and  uniformly 
distributed  in  the  same  range  as  the  input  data  We  tested  the  resulting  inverse  model  network  on  a 
different  2(XX)  points.  We  used  equaticxi  (4.2-1)  to  quantify  the  error  in  the  estimation.  The  relative 
rms  errors  in  estimating  the  stabilator  angle  and  the  thrust  for  both  the  training  and  test  sets  are 
shown  in  table  4.2-1  below: 

Table  4.2-1.  Relative  Rms  Error  In  The  Estimation  Of  The  Training  And  Test  Sets 

Using  A  Neural  Network  Diverse  Model. 


^TTITT 
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As  shown  in  the  table,  although  the  enor  in  estimating  the  test  set  is  about  twice  as  large  the 
size  of  the  enor  in  estimating  the  training  set,  it  is  still  small,  given  the  very  small  number  of 
training  data  relative  to  the  input  space.  Figures  4.3-1  and  4.3-2  compare  the  correct  outputs  vrith 
the  ou^uts  computed  using  the  inverse  model  netwoik,  for  50  points  of  the  training  and  test  sets 
req)ectively. 


O  S  10  16  20  25  30  35  40  45  so 


Figure  4.3-1:  Performance  Of  The  HyperBF  Inverse  Dynamics  Model  On  50 
Points  Of  Hie  Training  Set.  Dashed  Lines  Represent  The  Correct  Values  And 
Solid  Lines  Are  The  Estimated  Vtdues.  Dott^  Lines  Represent  The  Errors. 
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Flenre  4^2:  Performance  Of  The  HyperBF  Inverse  Dynamic  Modd  On  ^ 
Points  Of  Hie  Test  Set  Dashed  Lines  Represent  The  Correct  Values  And  Solid 
f  .i—  Are  The  Estimated  Values.  Dotted  Lines  Represent  The  Errors. 


4.4  Usiiig  the  Direct  InTerse  Model  Network  to  Omtrol  the  F/A-18 
Lonfftsdinal  Dynemics. 

We  tested  the  trained  inverse  model  netwoik  for  the  control  of  the  kmgitttdinal  dynamics  of 
the  F/A-18  aircraft  The  control  lo(^  used  is  illustrated  in  figure  4.4-1. 


Inverse  Model  iu. 


X  «  f  (X,  u) 


Figure  4.4-1.  Control  Loop  To  Test  The  Inverse  Model  Network 


A  desired  state  trajectory  is  generated  off-line.  The  difference  between  the  desired  and 
actual  trajectories  are  multiplied  by  a  gain  matrix  (a  diagonal  matrix  K  in  our  case),  to  compute 
desired  rates  of  change  of  the  states.  The  actual  states  and  tlm  desired  rates  of  change  of  the  states 
form  the  inputs  to  the  inverse  model  networks,  which  compute  the  controls  to  use  to  reach  tte 
desired  trajectories. 

When  we  used  the  direct  inverse  netwmk  generated  using  the  above  approach,  the  resulting 
control  loop  was  unstable.  This  result  was  surprising  given  the  very  good  approximation  of  the 
inverse  model.  However  after  a  closer  analysis,  it  was  found  that  since  the  netwoik  was  trained 
only  to  generate  controls  for  feasible  rates  of  change  of  the  states,  the  inverse  model  netwoik 
peifoimance  outside  this  region  could  not  be  predicted.  This  is  the  problmn  of  the  non-existence  of 
the  inverse  discussed  at  the  beginning  of  this  chapter. 

A  Stable  Inverse  Model  Network 


To  remedy  the  problem  that  the  inverse  may  not  exist,  we  have  to  provide  the  netwoik  with 
some  training  data  in  the  region  outside  the  feasible  set  of  rates  of  change  of  the  states.  As 
discussed  at  the  beginning  of  this  chaptm*,  we  can  use  a  criterion  for  selecting  the  control  as  a 
function  of  the  current  state.  For  example  we  can  use  feedback  errm  learning  Kawato  and  Gomi 
(1992)  to  train  the  network.  We  tried  here  an  off-line  approach  to  train  the  HyperBF  networks. 
Part  of  the  training  examples  were  generated  in  the  following  way: 

•  Select  random  states  and  desired  rates  of  change  of  the  states  within  a  specified  range. 

•  Linearize  the  nonlinear  aircraft  model  around  the  chosen  state  by  computing  the  jacobian  of 
the  nonlinear  dynamics  at  the  current  state  using  numerical  methods.  The  linear  aircraft 
model  is  described  by  equation  (4.4-1) 
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ij.aAxf-fBn  (4.4-1) 

wliete  ,  rqneseiits  die  desired  rate  of  change  oi  the  states  and  %  is  the  chosen  randcmi  stale. 

•  Knd  die  control  vector  a  using  the  following  equation : 

Or  *  f  ((BTB)-»BT(*,  .-Ax))  (4.4-2) 

where  the  function  g( .  )  limits  the  values  of  the  controls  to  within  the  allowed  region.  For  the 
simulations  dutt  follow  we  used  a  linear  functicm  f( . )  with  unity  slope  and  with  hard  saturation  at 
the  limits  of  allowed  contids. 

•  Use  die  set  (i^,  „  Or)  as  a  training  example  to  train  the  inverse  model  neural  network. 

We  used  a  training  set  of  2000  examples,  with  4(X)  examples  generated  using  the  above 
jMTOcedure.  The  remaining  training  examples  woe  generated  using  the  direct  inverse  approach,  as 
before. 

Results 

We  tested  the  inverse  model  HyperBF  network  generated  using  the  above  procedure  for 
eking  three  different  trajectories.  The  control  loop  used  is  as  illustrated  in  figure  4.4-2.  Desired 
tnyectoiies  were  generated  as  follows: 

•  Either  a  desired  altitude  or  ^peed  trajectory  was  first  specified,  the  other  variable  was  kept 
constant  We  used  a  fifth  order  polynomial  to  gmierate  a  smooth  desired  trajectory. 

•  The  angle  of  attacks  required  to  trim  the  aircraft  at  the  initial  and  final  states  were  specified. 
The  angle  of  attack  trajectory  was  taken  to  be  the  linear  interpolation  between  the  two  trim 
angle  of  attacks. 

•  The  pitch  angle  and  pitch  rate  were  computed  from  the  other  three  states  to  achieve  the 
desired  trajectory. 

It  is  important  to  note  diat  tiie  desired  trajectories  may  not  have  been  achievable  exactly  for 
all  the  states,  since  we  <»ly  used  kinematic  relations  between  the  states  to  derive  the  trajectories. 
The  desired  rates  of  change  of  states  were  generated  using  a  simple  gain,  multiplying  the  error 
between  the  current  state  and  the  desired  current  state  as  shown  in  equation  (4.4-3). 

k„(t)=sK(Xd(t)-x(t))  (4.4-3) 

K  is  assumed  to  be  a  diagonal  matrix.  At  a  sampling  rate  of  20Hz,  the  numerical  values  used  for 
die  gains  are  k^=  20,  kq  »  5,  k^  *  10,  k^  =10 ;  k^=  10,  *  1 
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We  used  dieie  gains  for  the  time  examples  that  we  describe  below. 

Example  1 

The  contnd  objective  in  this  example  is  to  track  a  smooth  increase  in  altitude  from  20,000  ft 
to  40,000  ft  in  250  sec.  The  speed  is  required  to  be  cmistant  at  a  value  of  700  ft/sec.  The  angle  of 
attack  is  a  linear  interpolation  betweoi  the  trim  value  at  (20,000  ft  /  7(X)  ft/sec)  and  (30,(XX)  ftfJOO 
ft/sec.).  The  pitch  angle  is  derived  by  first  ccmputing  the  flight  path  angle  (‘)0  from  the  desired 
altitude  and  ^leed  trajectories  and  dien  ccxnputing  6(t)  s  a(t)  -«■  ‘y(t). 

Using  the  gains  mentioned  above  and  the  trained  HyperBF  inverse  networks,  the 
trajectories  obtained  ftx:  die  different  variables  are  plotted  in  figure  4.4-2.  The  dotted  lines  rejxesent 
the  desired  ideal  trajectories  and  the  solid  lines  reinesent  the  simulated  trajectories  using  die  inverse 
dynamics  HyperBF  neural  networks.  As  shown  in  the  figures,  the  simulated  trajectories  are  very 
close  to  the  desired  ones.  The  discontinuities  shown  in  the  figure  are  due  to  the  nonlinear  software 
simulatitm  of  the  dynamics  and  not  due  to  the  controller. 


42 


OttriBS  River  Aadytict 


Flenre  4.4>2:  Comparison  Between  Desired  (Dotted  Line)  And  Simulated  (Solid 
Line)  T^ectories  For  Tracking  An  Altitude  Command  on  Nonlinear  Simulator 

Example  2 


In  this  example,  we  desire  to  track  a  speed  change  from  500  ft/sec.  to  800  ft/sec.  in  20 
sec,  while  keeping  the  altitude  constant  at  30,000  ft  The  desired  and  the  simulated  trajectories  are 
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dwwii  in  figme  4.4>3.  Am  shown  in  die  flgoies,  die  simulated  tn^ectories  i^tproach  veiy  wed  the 
desired  one.  Agdn  die  diacontinuities  in  the  diffireent  tiajectories  are  doe  to  diac<»tinnities  in  the 
simidatkm  itaelf  and  not  the  inverse  model  controller. 


Figure  4.4-3:  Comparison  Between  Desired  (Dotted  Line)  And  Simulated  (Solid 
Line)  Trajectories  For  Tracking  A  Slow  Speed  Command 
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Example  3 

This  example  is  siiiiilar  to  the  previous  one,  except  that  in  this  simulation  it  is  required  to 
achieve  the  desired  trajectory  in  10  sec.  only.  This  will  test  the  inverse  model  controller  near  the 
limits  of  the  controls.  Again  in  this  case,  the  performance  was  still  satisfactory.  The  thrust 
computed  using  the  inverse  model  HyperBF  networic  exceeded  the  maximum  values  of  60,000  lbs. 
and  had  to  be  limited  at  this  value  fw  the  simulation.  This  shows  the  ability  of  the  trained  HyperBF 
inverse  dynamic  model  to  extnqiolate  the  controls  necessary  to  achieve  a  high  rate  of  change  of  the 
states  (the  network  was  only  trained  for  controls  up  to  60,(X)0  lbs.).  The  results  of  the  simulation 
are  shown  in  figure  4.3-4. 
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Figure  4.4>4:  Comparison  Between  Desired  And  Simulated  Trajectories  For 

Tracking  A  Fast  Speed  Command 


4.4  Summary  and  Conclusions 

&i  this  chapter  we  presented  several  procedures  for  generating  an  inverse  dynamic  model. 
We  used  HyperBF  netwoiks  to  model  the  longitudinal  inverse  dynamics  of  an  F/A-18  simulator. 
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The  HyperBF  networics  with  Gaussian  units  were  aL  le  to  generalize  very  well,  given  the  extreme 
sparsity  of  the  training  examples.  Theoretically,  this  is  only  possible  if  the  inverse  dynamics  as  a 
function  of  the  states  and  the  rates  of  change  of  the  state  is  smooth.  If  the  inverse  dynamics  are  not 
smooth,  it  may  be  desirable  to  approximate  it  with  a  smoother  function  to  achieve  better  and  faster 
learning  of  the  inverse  function.  Since  the  inverse  dynamics  do  not  exist  everywhere,  it  is  also 
necessary  to  provide  the  netwoii:  with  examples  where  the  strict  inverse  does  not  exist.  Therefore, 
the  direct  inverse  model  for  training  a  neural  networic  is  not  adequate  and  may  result  in  unstable 
systems  in  most  cases,  if  not  augmented  with  an  additional  controller  to  generate  controls  when  the 
desired  trajectories  are  not  achievable. 
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5.  CONCLUSIONS  AND  PHASE  H  RECOMMENDATIONS 

5.1  Phase  I  Contributtons 

In  this  phase  I  effort  we  investigated  the  feasibility  and  possible  advantages  of  using 
different  neural  network  architectures  for  p^onning  different  control  functions.  Our  major 
contributions  in  this  phase  ate  the  following: 

•  Studying  the  properties  of  the  different  techniques  for  implementing  real-time  optimal 
control  using  dynamic  neural  networks  of  the  Hopfield-type. 

•  Developing  and  testing  a  new  constraint  satisfying  dynamic  neural  network,  based  on 
Lagrange  multipliers  method,  that  is  suitable  for  real-time  adaptive  optimal  control 

•  Proposing  and  testing  different  alternatives  for  implementing  system  parameter  identification 
using  different  neural  network  architectures.  We  conclude  that  dynamic  neural  networks  of 
the  Hopfield  type  can  perform  parameter  ID,  but  they  offer  little  advantage  over  other  real¬ 
time  computational  techrtiques  for  small  size  systems.  We  propose  a  parameter  ID  network 
based  on  associative  memory,  which  is  trained  to  estimate  the  network  parameters  based  on 
the  current  state  of  the  dynamical  system.  This  network  does  not  need  persistence  of 
excitation  for  adequate  identiflcation,  but  it  is  very  slow  at  recognizing  r^id  changes  in 
system  dynamics.  It  can  augment  the  performance  of  a  conventional  parameter  ID  system 
and  help  in  failure  detection. 

•  Studying  different  techniques  for  trairung  and  implementing  an  inverse  dynamics  neural 
network  and  investigating  the  advantages  and  disadvantages  of  these  techrtiques. 

•  Implementing  and  testing  a  navel  inverse  trim  based  on  HyperBF  neural  network. 

•  Successfully  implementing  an  inverse  dynantics  model  for  the  F/A- 1 8  longitudinal  dynamics 
using  Gaussian  HyperBF  neural  networks. 

•  Testing  the  HyperBF  controller  for  the  control  of  three  different  trajectories. 

5.2  Conclusions 

We  believe  that  the  biggest  potential  for  neural  networks  in  control  is  the  exploitation  of  the 
abUitv  to  design  adaptive  nonlinear  controllers.  Based  on  the  simulations  performed  in  this  phase  I 
study,  we  show  that  Hopfield  and  RBF  feedforward  network  architectures  may  have  a  great 
potential  in  the  control  of  nonlinear  systems.  In  particular,  Hopfield  implementation  of  Lagrange 
multiplier  method  is  suitable  for  real-time  adaptive  optimal  control.  Similarly,  RBF  feedforward 
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neural  network  architectures  are  suitable  for  learning  inverse  dynamics  and  inverse  trim  in  aircraft 
PCS  iq>plications.  In  addition,  RBF  feedforward  ate  easier  to  train  than  baclqpropagation  sigmoid 
networks  since  RBF  formulation  results  in  linear  parameters. 

The  initial  simulations  we  performed  show  very  promising  results  as  exemplifted  by  the 
small  ccmtrol  errors  in  closed-loop  simulations  using  die  nonlinear  F/A-18  longdtudinal  dynamics. 
Further  studies  are  needed  to  test  the  r^iplicability  of  the  techniques  to  teal  world  problems  and  to 
study  the  robustness,  stability  and  general  reliability  of  the  proposed  neural  techniques.  Neural 
networks  by  themselves  cannot  be  the  panacea  to  all  the  nonlinear  control  problems.  An  effort  has 
to  be  made  to  incorporate  all  the  available  knowledge  about  the  dynamic  system  to  achieve  good 
performance.  In  the  next  section  we  propose  several  extensions  to  the  current  effort  to  analyze  and 
improve  the  perfoimance  of  neural  network  controllers. 

5.3  Phase  n  Recommendations  : 

We  describe  below  several  extensions  to  the  research  reported  here  which  may  help  solve 
many  of  the  problems  associated  with  the  current  neural  network  approaches.  Table  5.3-1 
summarizes  the  recommendations  we  propose  for  a  phase  n  study  and  relate  these 
recommendations  to  what  has  been  achieved  in  phase  I.  The  proposed  phase  n  recommendations 
are  discussed  in  mote  detail  in  the  next  section. 

Use  of  domain  specific  knowledge 

In  this  phase  I  study,  our  main  focus  was  the  proof  of  the  concept  that  neural  netwoiks  can 
be  used  successfully  in  control.  We  mainly  used  brut  force  learning  and  in  optimization  techniques 
to  prove  our  ideas.  In  phase  n,  we  propose  to  use  more  specific  knowledge  about  flight  control  in 
designing  the  architecture  of  the  neural  networks,  in  its  training,  and  optimization.  We  believe  that 
the  inclusion  of  domain  specific  knowledge  ,  such  as  relationships  between  variables  or  known 
effects  of  a  particular  input  variable,  can  reduce  the  amount  of  training  considerably  and  increase 
the  accuracy  of  the  goierated  neural  networks. 

Integration  of  tile  different  modules 

In  the  phase  I  effort,  we  tested  the  performance  of  each  neural  network  module  separately. 
However,  we  did  not  test  the  performance  of  the  whole  dynamic  system.  In  Phase  n,  we  propose 
to  test  the  complete  control  system  for  performing  large  maneuvers.  We  will  simulate  situations 
such  as  failure  to  test  the  robustness  and  adaptation  of  the  proposed  techniques.  We  also  propose 
to  develop  and  incorporate  a  handling  qualities  model  in  the  control  loop. 
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Improving  die  robustness  of  die  neural  networks  modules 

Under  the  current  study,  we  used  ideal  ^ulations  with  no  noise  to  train  and  test  the 
performance  of  the  developed  neural  netwodt  modules.  The  performance  of  these  systems  may  be 
different  in  real  time  applications  where  noise  is  present  In  phase  n,  we  propose  to  study  the 
robustness  of  tte  neural  network  modules  to  noise  and  to  explore  different  methods  to  improve 
their  performance  when  noise  is  present  Methods  that  we  plan  to  explore  include:  artificially 
inducing  noise  in  training,  using  domain  specific  knowledge,  exploring  different  architectures  that 
ate  more  immune  to  noise,  and  adding  a  noise  robustness  term  in  the  objective  function  used  to 
train  the  neural  networks. 

On-line  design  of  optimal  controllers 

Many  of  the  techniques  used  in  designing  control  systems  can  be  formulated  as  the 
optimization  of  an  objective  function.  Techniques  such  as  pole  placement,  LQG,  loop  transfer 
recovery,  Hoo  design  and  pi-synthesis  are  examples  of  such  techniques.  In  the  current  study,  we 
explored  different  methods  for  optimization  using  dynamic  neural  networks.  We  propose  to  extend 
these  methods  to  the  on-line  design  of  controllers  that  can  be  formulated  as  the  optimization  of  an 
objective  functioiL 

Using  bi-directionai  associative  memories  in  control 

Bi-directional  associative  memories  (BAM)  do  not  classify  the  different  variables  as  inputs 
or  outputs  to  the  network.  Any  variable  can  be  either  an  input  or  an  output  variable.  The  role  of  the 
BAM  network  is  to  predict  the  values  of  the  missing  variables.  BAM  networks  may  prove  to  be 
very  useful  in  applications  such  as  an  inverse  trim  network,  where  the  variables  specified  vary 
depending  on  the  control  objective.  In  phase  n,  we  plan  to  explore  methods  for  implementing 
BAM  networks  and  explore  their  use  in  control  systems. 

Reducing  the  dimensionality  of  the  inverse  dynamics  network: 

In  phase  I,  we  developed  full  inverse  dynamics  models.  This  resulted  in  a  high¬ 
dimensional  neural  network  that  requires  a  large  number  of  data  points  for  training.  We  also  have 
shown  that  there  is  a  smaller  dimensional  space  where  the  inverse  is  properly  defined.  Moreover,  it 
may  be  unrealistic  to  specify  the  desired  rates  of  change  for  all  states  of  the  dynamic  system.  In 
phase  n,  we  plan  to  explore  methods  for  reducing  the  number  of  input  dimensions  in  the  inverse 
model. 
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On-line  learning  of  the  inverse  fiinction 

In  the  current  work,  the  training  of  the  inverse  dynamics  was  done  off-line  and  then  the 
trained  inverse  model  was  tested  using  the  nonlinear  longitudinal  flight  simulator.  No  learning 
occurred  in  real-time.  In  phase  n,  we  propose  to  implement  continuous  training  of  the  inverse 
model.  Moreover,  we  propose  to  test  the  real-time  adaptability  of  the  inverse  model  network  by 
varying  the  system  dynamics.  We  plan  to  compue  both  Kawato's  feedback  error  learning  and 
Jordan's  back-propagation  through  a  forward  modeL 

Optimization  of  optimai  control  and  identification 

Many  ccmtrol  and  identification  problems  are  formulated  as  the  optimization  of  an  objective 
function.  However,  the  choice  of  the  objective  funcdcm  itself  relies  on  heuristics  and  trial-and-error 
procedures.  In  phase  I,  we  assumed  these  objective  functions  to  be  completely  specified  by  the 
control  system  designer.  In  a  phase  n  study,  we  plan  to  explore  ways  for  automating  the  choice  of 
the  objective  fimctions  based  on  more  abstract  goals.  We  plan  to  explore  fuzzy  logic  and  neural 
network  techniques  which  mtq)  the  abstract  goals  into  objective  function  parameters  and  weights. 
Moreover,  we  propose  to  explore  neural  network  techniques  for  multi-objective  optimization. 
Multi-objective  optimization  may  be  useful  when  there  are  conflicting  goals  to  optimize,  such  as  the 
case  in  the  control-identification  tradeoff  discussed  in  the  parameter  ID  chapter. 

Different  neural  network  implementations  of  system  ID 

In  phase  I,  we  only  tested  the  implementation  of  state  space  models  system  ID  using 
Hopfield  optimization  networks.  In  a  phase  n  effort,  we  propose  to  implement  different 
rqrproaches  to  parameter  ID.  hi  particular,  we  plan  to  implement  S4ID  using  Hopfield  optimization 
networics  and  implement  parameter  ID  based  on  associative  memories. 

Expanded  aircraft  simulations 

In  the  current  effort,  we  limited  our  simulations  to  the  longitudinal  dynamics  of  an  F/A-18 
fighter  aircraft  Moreover,  the  maneuvers  we  simulated  were  limited  to  the  available  nonlinear 
flight  simulator  capabilities.  Under  a  phase  n  study,  we  plan  to  acquire  a  more  detailed  flight 
simulator,  expand  our  simulations  to  the  aircraft  full  dynamics,  and  simulate  a  wider  range  of 
maneuvers.  This  will  allow  us  to  test  our  neuro-controllers  under  a  wide  variety  of  situations  and 
will  allow  us  to  study  the  limitations  of  the  techniques  we  developed. 
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Software  impleoieiitatioii  of  neuro-coiitrol  systems  : 

The  design  and  implementation  of  neuro-control  modules  is  a  time  consuming  and 
expensive  process.  We  propose  to  use  the  experience  we  gained  in  Phase  I  to  develop  a 
neuro-control  toolbox  with  advanced  user  intedace,  based  on  our  CASYS  software  design  tool 


Table  5.3-1  Features  Of  Hiase  I  And  Phase  II  EITorts 


Fcatur* 

Rhaaa  1 

Phaaa  11 

Ob|«cliv« 

Oamonatrata  appUoabWly  of 
naural  nalawrit  oonlraliara 

Pioduoa  a  naura<anlral  toolbox 

AppHoation  In  airarafi  oonlrol 

Domain  apaeiflc  knowMoa 

Not  fuNy  axploitad 

Exploit  prior  knoartadga  about  tha  aftacl  of 
diffarant  oontrola,  ralatlonaMpa  batwaan 
alataa  ... 

Systam  Intagrallon 

Taatad  aaoh  moduia  aapantaly 

FUN  taat  of  tha  oomplata  oontrai  ayatam  undar 
diffarant  flight  oonditiona 

Robustnasa 

Not  taatad 

Taat  patfOrmanca  wth  nolaa. 

Explora  mathoda  of  naural  italwork  training 
that  improva  robuatnaaa 

On-llna  daaign  of  optimal  oontrollara 

Hopflald  optimal  traiactory 
flanaratlon 

Pola  plaoamant 

UaQA.TR 

K. 

Bl-diracllonal  aaaodaliva  mamoriaa 
(BAM) 

Not  taatad 

Uaa  BAM  to  Implamaid  liwaraa  trim 

Obnanaionality  of  Invaraa  modal 

Full  dhnanaionalily 

Raduoa  dtanatwlonalily  to  improva  laatnittg  and 
control 

On*llna  taata 

Only  off-Una  training 

Taat  on-iina  training  and  oonlrol 

Opiifflizallon  of  obiactiva  function 

paramatara 

Not  ajgriorad 

*  Biplora  fuzzy  logic 

•  Muhi-ob|actlva  optimization 

Maura  Contral  TooImk 

Pratotypaa  uaing  diffarant 
languagaa:  C,  Fortran.  Matiab^^ 

CASYS  wah  ac** 
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Appendix  A:  Radial  Bads  Functions  Networks  for  Function  Approxinuition 

The  Radial  Basis  Functions  (RBF)  approach  to  approximating  continuous  functions 
belongs  to  the  class  of  non-parametetic  approximation  techniques.  Non-parametiic  techniques  also 
include  feedforward  neural  networks,  projection  pursuit  regression,  nearest  neighbor 
approximation  and  local  weighted  regression.  RBF  approximation  consists  of  modeling  an  input 
ouq)ut  msqiping  as  a  linear  combination  of  radially  symmetric  functions  (Powell  1987;  Girosi  and 
Poggio,  1990;  Broomhead  and  Lowe,  1988;  Moody  and  Daricen,  1989).  It  was  first  developed  as 
an  exact  interpolation  approach;  that  is,  it  reproduces  the  ouq>uts  of  the  given  examples  exactly. 
The  output  of  the  interpolating  function  is  described  by  the  following  equation  : 

y(x)  =  Jj  Q  <k|x  -  *i|)  (A-1) 


where  x  6  and  y  6  represent  the  input  and  ouqiut  respectively,  II  .  II  represent  the 
Euclidean  norm,  Q’s  are  the  coefficients  to  be  estimated,  and  N  is  the  number  of  data  points  in  the 
training  set 

Depending  on  the  type  of  RBF  used,  a  polynomial  term  of  the  form 
%  (A-2) 

is  added  to  equation  (A-1),  where  the  }ij  are  unknown  coefficients,  and  Pj™  are  polynomials  of 
degree  ^  m.  In  such  a  case,  since  the  number  of  parameters  is  larger  than  the  number  of  data 
points,  the  following  extra  constraints  are  added  to  make  the  parameter  estimation  problem  well 
posed  (Powell,  1987). 
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Eatanpfcw  RBFs  include : 


•Qanssians 

•  Hardy  Mulfiqnadrics 

•  Haidy  biveise  Multiquadrics 

•  Cubic  Splines 

•  Linear  Splines 

•  Thin  Plate  Splines 


0(nj)»exp(^) 

(ry)  =  Vrij2  +  c2 

^  (ry)  =  ry^ 

^  (nj)  =  rij 

{r^“"'‘log(r)  reven 

rodd 

2k  >  d«  d  is  the  dimension  of  the  input  and  k 
is  a  smoothness  parameter. 


where  ry  =  II  Xi  -  xj  II . 

Smne  of  these  RBFs  (e.g.  CSaussians  and  multiquadiics)  have  an  explicit  width  parameter  c 
that  needs  to  be  determined.  However  we  can  also  fix  this  width  parameter,  for  example  to  have  a 
value  of  (Mie  and  scale  the  data  instead,  hi  the  triplications  we  report  in  diis  work,  we  use  Gaussian 
RBFs. 

To  find  the  coefficients  Cj  for  exact  interpolation,  we  have  to  invert  a  square  matrix  which 
is  theoretically  guaranteed  to  be  nonsingular  for  a  wide  class  of  radial  basis  fimctions,  given,  of 
course  distinct  data  (Micchelli,  1986).  Since  the  equations  are  linear,  there  are  a  number  of  batch 
and  recursive  algorithms  that  exist  for  finding  the  exact  value  of  the  coefficients.  The  linearity  of 
the  function  with  req;iect  to  the  coefficients  Cj  guarantees  the  convergence  to  die  globally  optimal 
parameters.  Since  many  of  the  optimization  algorithms  require  the  inversion  of  a  matrix  of  rank  n , 
the  computational  complexity  for  finding  the  optimal  parameters  is  CXn^).  whne  n  is  the  number  of 
data  points  and  also  the  number  of  coefficients  in  the  case  of  exact  interpolation.  Exact  interpolation 
may  not  be  desirable  if  the  data  are  noisy  or  if  the  computational  burden  is  high. 

A.1  Matfiematicai  Interpretation  of  RBFs 

As  mentioned  by  Poggio  and  Girosi,  (1989),  many  of  the  radial  basis  fimctions  are  the 
Green  fimctions  obtained  by  solving  diffident  regularization  problems  of  the  fixm: 

n 

Y(yi  -  f(xi))2  +  X.  II  Pf  ii2  (A.1- 1) 

1*1 


58 


R93081 


Chides  River  Analytict 


Where  P  in  die  above  equadon  is  a  radially  symmetric  differential  opoator.  For  example  Gaussian 
RBFs  result  from  operators  P  of  the  f(xm: 

(A.1-2) 

where  P^  =*  V2“  and  p2ni+l  s  V  is  the  Laplacian  operator  and  the  coefficients  am  = 

ffiak 

Similar  regulaiizaticMi  functions  may  also  be  derived  for  other  types  of  RBFs.  Since 

regularization  is  also  related  to  Bayesian  estimation,  we  can  think  of  RBFs  as  a  special  case  of 
Bayesian  estimation  (Girosi.  Poggio  and  Caprile,  1990),  where  the  prior  probability  of  the 
fimctitm  f(z)  is  assumed  to  be 

P(f)aexp(-XIIPfl|2  (A.1-3) 

Another  interesting  interpretation  for  some  forms  of  RBFs  made  by  Schagen  (1980)  is  to 
regard  the  given  training  examples  as  point  realizations  of  a  stationary  stochastic  process  Z(x). 
The  stationarity  of  Z(x)  implies  that  the  mean  and  variance  of  the  process  at  any  point  are  constants 
and  that  the  covariance  between  two  points  is  only  a  function  of  the  difference  between  these  two 
points.  If  we  make  the  stronger  assumption  that  the  covariance  between  two  points  g(xi,xj) 
depends  only  on  the  distance  between  these  two  points(ie.,  g(xi,xj)  s  g(llxi  -  xjll))  and  given 
that  the  function  g(r)  satisfies  the  covariance  prc^ierties,  namely  g(0)  s  1,  g(r)  £  1  for  r  ^  0  and  the 
covariance  matrix  is  nonnegative  definite,  some  RBF  solutions  may  be  interixeted  as  the  best  linear 
unbiased  estimate  of  the  stochastic  process  given  the  data  points.  For  the  derivation  of  this  result, 
we  refer  the  reader  to  die  paper  by  Schagen  (Schagen,  1980).  Gaussian  RBFs  satisfy  all  these 
assumptions  and  constraints.  Note  that  not  all  RBF  functions  mentioned  above  satisfy  all  the 
assumptions  required  for  this  interpretation.  RBFs  with  increasing  function  values  as  the  distance 
from  the  center  increases  cannot  model  a  covariance  since  g(r)  ^  g(0)  for  some  r  >  0. 

Both  of  the  above  interpretations  of  radial  basis  functions  make  some  a  priori  assumptions 
about  the  degree  of  smoothness  of  the  function  to  be  approximated.  These  a  priori  assumptions 
d^ranine  die  sluqie  of  die  radial  basis  functicm  used. 

A.2  Extenrions  to  RBF:  GRBF  and  HyperBF 

To  reduce  the  problmns  of  exact  interpolation,  many  researchers  have  suggested  using  a 
smaller  numbor  of  basis  functions  (Broomhead  and  Lowe,  1988;  Girosi  and  Poggio,  1989;  Moody 
and  Darken,  1989).  In  this  case  it  is  not  possible  to  reproduce  the  exact  outputs  in  general.  To 
choose  the  centers  of  the  basis  functions  we  can  use  optimization  techniques  (Girosi  and  Poggio, 
1989)  or  also  heuristic  algorithms  based  on  the  distribution  of  the  data  (Moody  and  Darken,  1989). 
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The  RBF  approxunadra  with  movaUe  centers  has  been  called  Generalized  Radial  Basis  Functions 
(GRBF)  (Giiosi  and  Poggio,  1989).  The  estimation  of  the  location  of  the  cotters  of  the  GRBFs 
using  least  square  error  optimization  techniques  is  not  an  easy  problem.  This  is  because  the  error 
surface  is  not  convex,  and  the  number  of  parameters  to  be  estimated  is  large.  From  our  experience 
with  different  computer  simulations  and  using  second  order  nonlinear  optimization  techniques,  we 
found  that  it  is  very  hard  to  adjust  the  centers  and  that  usually  the  gain  in  performance  is  small 

Another  extension  to  the  RBF  iq>proach,  described  also  by  Poggio  and  Gitosi  (Girosi  and 
Poggio,  1989)  is  known  as  Hyper  Basis  Functions  (HyperBF).  This  is  a  further  generalization  of 
the  GRBF  technique,  which  includes  using  radial  basis  functions  of  different  widths  or  also  non- 
radial  basis  functions.  Similar  types  of  basis  functions  have  been  described  by  Saha,  Christian, 
Tang,  et  aL  (1991)  for  image  coding  and  analysis  and  have  been  termed  Oriented  Non-Radial  Basis 
Functions  (ONRBF).  Saha,  et  al.  (1991)  suggest  a  gradient  descent  algorithm  for  finding  the 
parameters  of  the  ONRBFs.  Varying  the  widths  of  the  RBFs  is  also  equivalent  to  using  a  general 
norm  rather  than  the  Euclidean  norm  to  compute  the  distance  of  a  point  from  the  center  of  the  basis 
function. 

The  equation  describing  the  ouq)ut  in  terms  of  the  basis  functions  and  the  different  inputs  is 
as  follows : 

y(x)  =  ij  Ci  <!>((  X  -  Xj  1^)  (A.2>  1) 

where 

|x-X;^  =  (xi  -  Xj)"^  W^W  (xi-xj)and  W  is  a  square  matrix. 

From  our  practical  experience,  it  is  found  that  the  W  matrix  plays  a  very  important  role  in 
the  quality  of  generalization.  This  is  especially  true  for  functions  which  do  not  meet  the 
smoothness  assumptions  implied  a  priori  by  using  a  certain  type  of  RBFs. 

We  use  HyperBF  networks  with  Gaussian  units  and  diagonal  weight  matrix  W  in 
modeling  the  aircraft  longitudinal  inverse  dynamics,  the  inverse  trim  function,  and  the  associative 
parameter  ID  module,  hi  the  next  section  we  describe  different  methods  for  estimating  the  diagonal 
weight  matrix  W,  given  die  input-output  data. 

A.3  Estimating  a  Diagonal  Weight  Matrix  W  for  Gaussian  RBFs 

There  are  many  possible  ways  for  estimating  the  weight  matrix  W.  One  class  of  methods, 
based  on  optimization  techniques,  is  to  find  a  W  which  minimizes  the  sum  of  the  square  of  the 
errors  between  the  ouqiut  of  the  RBFs  and  the  training  set  output  Nonlinear  optimization 
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techniques,  that  might  be  used  inclutte  gradient  descent,  second-order  nonlinear  optimization 
techniques  or  variations  of  random  search.  Gradient  and  second  order  methods  are  not  guaranteed 
to  converge  to  the  global  minima  and  are  sensitive  to  the  initial  choice  of  parameters.  Also,  for 
large  amounts  of  data  and  RBFs,  the  amount  of  computation  involved  in  second  order  methods 
bectunes  very  large.  In  random  search,  the  amount  of  computation  for  each  step  is  relatively  small, 
but  a  very  large  number  of  steps  may  be  needed  to  converge,  especially  when  the  number  of 
parameters  to  be  estimated  is  large.  The  main  advantage  of  random  search  is  that  it  can  escape  from 
local  minima.  Caprile  and  Girosi  (1990)  present  a  simple  random  search  technique  that  has  been 
found  to  work  well  in  practice. 

Another  alternative  method  for  determining  the  best  diagonal  weight  matrix  W  is  to  use 
cross-validation  techniques  to  estimate  the  Wj's  in  the  different  input  dimensions.  This  has  the 
advantage  over  minimum  training  error  techniques  in  that  it  attempts  to  minimize  the  predicted 
mean  square  error,  as  opposed  to  the  mean  square  error  of  the  training  set  only,  and  therefore  may 
generalize  better.  Hutchinson,  Kalma  and  Johnson  (1984)  have  attempted  to  use  generalized  cross 
validation  to  find  scaling  parameters  for  one  input  variable.  More  work  is  needed  to  generalize  this 
technique  to  more  than  one  input  variable  in  an  efficient  way.  However,  in  general,  cross 
validation  techniques  tend  to  be  computationally  very  expensive,  and  it  may  be  very  difficult  to 
adapt  these  techniques  to  real  time  plications. 

Many  researchers  have  explored  some  heuristic  methods  for  the  estimation  of  the  RBF 
widths  and  center  locations.  Moody  and  Darken  (1989)  describe  methods  based  on  adaptive 
clustering  of  the  input  data.  In  their  analysis,  they  tomlly  ignore  the  characteristics  of  the  fimction 
to  be  approximated.  Platt  (1991)  describes  a  resource  allocating  network  tha'  iaptively  adds  basis 
functions  based  on  a  novelty  measure.  The  novelty  measure  is  based  on  two  factors:  the  accuracy 
of  the  approximation  and  the  distance  of  the  new  experience  from  the  previous  data  points.  The 
width  of  the  RBFs  is  proportional  to  the  distance  to  the  k-nearest  neighbor.  Although  the  output 
data  in  this  method  are  used  in  the  choice  of  the  center  locations,  the  estimation  of  the  RBF  widths 
is  still  completely  dependent  on  the  input  distribution  only.  Hutchinson  (1993)  proposes  a  heuristic 
algorithm  for  finding  a  reasonable  set  of  initial  values  of  the  parameters  of  the  RBFs.  His  algorithm 
is  a  generalization  of  Moody  and  Darken  algorithm  and  allows  for  the  possibility  of  estimating  the 
widths  of  RBFs  based  on  the  ouqrut  as  well  as  the  input  data.  Methods  that  depend  only  on  the 
input  distribution  to  determine  the  RBF  width  parameters  may  not  work  well  if  the  dependence  of 
the  function  to  be  approximated  in  the  different  directions  of  the  input  space  is  not  uniform.  Mel 
and  Omohundro  (1991)  describe  a  method  that  depends  on  the  second  order  derivatives  of  the 
function  to  be  £q)proxiinated  with  re^>ect  to  the  different  input  variables. 


61 


R93081 


Charies  River  Analytics 


In  the  simulations  reported  in  this  work,  we  use  a  heuristic  technique  for  estimating  for 
estimating  a  diagonal  W  for  Gaussian  RBFs.  This  method  is  based  on  approximating  the  first 
order  partial  derivatives  of  the  function  to  be  iq>proximated  with  respect  to  the  different  input 
variables.  Although  this  method  is  not  proven  to  optimize  any  cost  function,  it  is  found  to 
approximate  and  sometimes  surpass  the  results  obtained  using  nonlinear  optimization  techniques 
(Botros  and  Atkeson,  1991).  The  diagonal  width  parameters  are  assumed  to  depend  on  the  average 
variation  of  the  function  in  each  direction,  as  measured  by  the  sum  of  the  square  of  the  first  partial 
derivatives  in  each  direction,  in  addition  to  the  variance  of  the  input  data  in  the  different 
dimensions.  Define  the  average  gradient  g  to  be: 


g  = 


(A.3-1) 


and  the  normalized  vector  g  =  iTgil*  empirically  that  a  good  approximation  for  the 

diagonal  of  W  is  as  follows : 

k 


wu  =  gu 


VE((xi-ti)2) 


(A.3-2) 


where  the  subscript  i  denotes  the  i***  input  variable,  E{  .  )  represents  the  expected  value,  and  q  is 
the  component  of  the  centers  of  the  RBFs.  The  parameter  k  is  a  constant  that  can  be  determined 
by  cross  validation  or  least  mean  square  optimization.  From  simulations  we  find  that  the 
tqrproximation  is  not  very  sensitive  to  a  range  of  the  values  of  k.  However,  the  choice  of  a  good  k 
is  important  for  the  conditioning  of  the  coefficients  of  the  RBF.  The  bigger  the  value  of  k,  the 
smaller  will  be  the  equivalent  width  of  the  RBFs,  and  the  estimation  of  the  coefficients  will  be  less 
singular. 

To  understand  why  this  suggested  form  of  W  makes  sense,  we  can  divide  the  equation  for 
W  into  two  terms.  The  first  term  is  the  normalized  gradient  functional  and  the  second  is  a 
normalization  factor  that  normalizes  the  input  space  so  that  the  different  inputs  have  approximately 
similar  range  of  values.  The  gradient  functional  term  results  in  making  the  first  order  terms  in  the 
legularizer  operator  for  Gaussian  RBFs  (equation  b-5)  have  approximately  equal  magnitude.  This, 
in  turn,  satisfies  the  assumptions  made  by  the  regularizer.  Intuitively,  the  width  of  the  Gaussian 
functions  will  be  smaller  in  the  directions  where  the  approximated  function  varies  the  most.  Note 
also  that  if  one  variable  is  irrelevant,  its  derivative  function  will  be  zero,  and  therefore  the 
corresponding  w  component  will  also  be  zero. 
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The  idea  of  separating  the  estimation  of  the  norm  metric  from  the  estimation  of  the  other 
function  approximation  parameters  has  been  recognized  by  many  other  researchers  Girosi  (1992) 
Moody  and  Darken  (1989);  Samarov  (1991)  Li  (1992)  Zhao  (1992).  Some  of  these  researchers 
have  also  suggested  the  use  of  different  forms  of  'dbtivative  functionals  for  other  function 
approximation  techniques  (Samarov,  1991;  Zhao, 1992;  Li,  1992).  Both  Samarov  and  Li  have 
(tesctibed  methods  for  estimating  diese  derivative  functionals  or  expected  derivatives  from  the  data 
and  mentioned  the  assumptions  under  which  these  estimations  are  valid.  We  use  an  iterative 
technique  for  the  estimation  of  the  matrix  W.  Tlus  techniclpe  starts  by  first  estimating  the  function 
to  be  iq>proximated  using  a  diagonal  W  matrix  equal  to  the  inverse  of  the  variance  of  the  input  data 
in  the  different  input  directions,  and  then  estimating  the  derivative  functional  using  the 
approximated  function.  We  then  use  the  derivative  functional  to  update  the  value  of  the  W  matrix 
and  use  this  latter  to  improve  the  function  approximation.  In  practice,  this  procedure  usually 
converges  in  3  or  4  iterations.  However  a  more  detailed  mathematical  analysis  is  needed  to 
understand  the  convergence  properties  of  this  algorithm. 
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